
005.3 

Ac64a 


SIGPLAN 

NOTICES 


A Monthly Publication 
the Special Interest (Iff 
on Programming Langu^ 


Volume 16, Number 6, June 1981 


4 : PROCEEDINGS OF THE 
ACM SIGPLAN SIGOA SYMPOSIUM 
ON TEXT MANIPULATION 
PORTLAND, OREGON 
JUNE 8-10, 1981 


ACM Order No. 548810 



SIGPLAN NOTICES 




special interest group on programming languages 


SIGPLAN Notices is an informal monthly publication of the Special 
Interest Group on Programming Languages (SIGPLAN) of the As- 
sociation for Computing Machinery. 

Membership in SIGPLAN is open to ACM Members or associate 
members for $16 per year. It is also open to others whose major 
professional allegiance is in a field other than information process- 
ing or computing for $30 per year. All SIGPLAN members receive 
SIGPLAN Notices, are given discounts at SIGPLAN-sponsored 
meetings, and may vote in the Group's biennial elections. ACM 
members of SIGPLAN may serve as officers of the group. 

Institutional or Library subscriptions to SIGPLAN Notices are 
available for $35 per year, and the regular back issues of the 
Notices may be purchased for $3 per copy from ACM Headquarters 
SIGPLAN symposium and conference proceedings issues of the 
Notices are available, at different prices, from ACM Headquarters. 

Members of the Special Technical Committee on APL (STAPL). 
a SIGPLAN-affiliated users group, receive the APL Quote-Quad, an 
infcrmal quarterly publication of interest to APL users. 

All correspondence concerning STAPL should be directed to the 
committee chairman, Eugene E. McDonnell, I. P. Sharp Assoc. Ltd., 
220 California Ave., Suite 201 , Palo Alto, CA 94306. 

CONTRIBUTIONS TO SIGPLAN Notices should be sent to the 
editor. They should be on standard letter-size pages and camera 
ready. Typed contributions should be single spaced. Other accept- 
able formats include photoreduced double column or "2-up" (two 
physical pages reduced to fit one page). Articles that contain too 
much blank space will be rejected. It is the author's responsibility 


to retain a working copy of his paper, as contributions will not be 
returned to authors. Authors not fluent in writing English are re- 
quested to have their work reviewed and corrected for style and 
syntax prior to submission. 

The current page limit on contributions is 15 pages, and is 
enforced. Collections of articles from single sources are discour- 
aged, and newsletters will in general not be reproduced. Newsletter 
editors may contact the SIGPLAN Notices editor for information 
regarding publication of article summaries and tables of contents 
from their newsletters. Individual newsletter articles may be sub- 
mitted as contributions. 

Letters to the Editor will be considered as submitted for publica- 
tion unless they contain a request to the contrary. Letters that are 
unprofessional, discourteous, excessively long, or irrelevant will not 
be published. 

Technical papers appearing in this issue are unrefereed working 
papers, and all contributions are ordinarily to be construed as 
personal rather than organizational statements. Authors planning to 
submit to a journal any manuscript appearing in SIGPLAN Notices 
should make certain that the journal version differs sufficiently in 
content to satisfy the editor of that journal. All questions regarding 
journal policy on possible duplicate publications should be directed 
to the editor of the journal in question. 

PROSPECTIVE ORGANIZERS OF SIGPLAN TECHNICAL MEET- 
INGS are referred to the Volume 1 5, No. 3. March 1 980 issue of 
SIGPLAN Notices in which the "Suggested Guidelines for SIGPLAN 
Technical Meetings " were published. 


CHANGES OF ADDRESS AND QUESTIONS 
PERTAINING TO THE MAILING OF SIGPLAN 
Notices should be sent to ACM Headquarters: 

ACM SIGPLAN 
1 1 33 Avenue of the Americas 
New York, New York 10036 
Phone. 212/265-6300 


EXECUTIVE COMMITTEE 

Chairman 
Paul Abrahams 
214 River Road 
Deerfield, MA 01342 
(413) 774-5500 


Vice Chairman 
John R. White 
Elect. Eng. & Comp. Sci. 
University of Connecticut 
Storrs, CT 06268 
(203) 486-2572/4816 


Secretary-T reasurer 
Mary Van Deusen 
Prime Computer, Inc. 

500 Old Connecticut Path 
Framingham, MA 01701 
(617) 879-2960 


Members 
Allen Ambler 
Dialogic Systems 
3910 Freedom Circle 
#102B 

Santa Clara, CA 95050 
(408) 988-8116 

Barbara Liskov 
Lab. for Comp. Sci. 

M.l.T. 

545 Technology Square 
Cambridge, MA 02139 
(617) 253-5886 

Mary Shaw 

Computer Science Department 
Carnegie-Mellon University 
Schenley Park 
Pittsburgh, PA 15213 
(412) 578-2589 


Editor 

Victor B. Schneider 
Wang Laboratories 
M/S: 1379 
1 Industrial Avenue 
Lowell, MA 01851 
(61 7) 459-5000 X 5586 

Past Chairman 
Stephen N. Zilles 
IBM Research Lab 
K-54/282 

Monterey & Cottle Roads 
San Jose, CA 95193 
(408) 256-7559 

STAPL Chairman 
Eugene E. McDonnell 
I. P. Sharp Assoc. Ltd. 
220 California Ave. 

Suite 201 

Palo Alto, CA 94306 
(415) 321-5542 


Note: This issue was received 
at ACM HQ for production on 


Composition and Design 
Muriel M. Furman 
The Aerospace Corporation 
Post Office Box 92957 
Los Angeles, CA 90009 
(213) 648-5405 


SIGPLAN INSTITUTIONAL 
SPONSORS 

Prime Computer, Inc., MA 



PROCEEDINGS OF THE 
ACM SIGPLAN SIGOA SYMPOSIUM 
ON TEXT MANIPULATION 
PORTLAND, OREGON 
JUNE 8-10, 1981 


ACM Order No. 548810 




The Association for Computing Machinery, Inc. 
1133 Avenue of the Americas 
New York, New York 10036 


Copyright ® 1981 by the Association for Computing Machinery, Inc. Copying without fee is permitted pro- 
vided that the copies are not made or distributed for direct commercial advantage and credit to the source is 
given. Abstracting with credit is permitted. For other copying of articles that carry a code at the bottom of the 
first page, copying is permitted provided that the per-copy fee indicated in the code is paid through the 
Copyright Clearance Center, P.O. Box 765' Schenectady, N.Y. 12301. For permission to republish write to: 
Director of Publications, Association for Computing Machinery. To copy otherwise, or republish, requires a 
fee and/or specific permission. 

ISBN 0-89791-043-5 


Additional copies may be ordered prepaid from: 


ACM Order Department Price 

P.O. Box 64145 Members $12.50 

Baltimore, MD 21264 Non-Members $15.00 


ACM Order No. 548810 


Preface 


6-V 



The use of computers for text manipulation is becoming as prevalent as their use for 
numeric computation. Commercially developed word processing systems have become 
everyday tools. Document preparation systems are now a standard facility at research insti- 
tutions, and they have become quite common on personal computers. Every paper in these 
proceedings was prepared using some kind of document preparation system. Moreover, 
more than half of them were prepared on phototypesetters. 

The ACM SIGPLAN/SIGOA Symposium on Text Manipulation is intended to bring 
together recent research and experimental developments in this area. The 20 papers 
presented at this symposium were selected from 77 papers submitted. The selected papers 
cover a variety of topics: editing and document preparation systems, implementation 
methods, user interface design, automated stylistic analysis, creation of diagrams for inclu- 
sion in texts, and algorithms for screen updating, spelling correction, and line breaking. 
Both natural language and programming language texts are considered. 

When the program committee was evaluating the papers they observed that many 
authors were unaware of the existing literature on the subject. Since this literature is scat- 
tered and often hard to locate, the committee suggested that these proceedings include a 
bibliography. The bibliography at the end of the proceedings was prepared by Brian Reid 
and David Hanson, with assistance from Chris Fraser and Ben Shneiderman. We hope that 
it will enhance the value of this volume. 


Paul Abrahams 
Conference Chairman 


Program Committee: 

Russell Abbott, California State University, Northridge 

Paul Abrahams, Independent Consulant 

Charles Geschke, Xerox Palo Alto Research Center 

David Hanson, University of Arizona 

Brian Kernighan, Bell Labs 

James King, IBM San Jose Research Laboratory 

Brian Reid, Stanford University 

Local Arrangements: 

j 

Jon Meads, Intel Corporation 
Mayer Schwartz, Tektronix 

) 



j 



SIGPLAN/SIGOA Symposium on Text Manipulation 

Portland, Oregon, June 8-10, 1981 



Monday, June 8, 1981 

9:00 Opening Remarks. Paul Abrahams, Confer- 
ence Chairman 

9:15 Session I. Chairman: David Hanson, 
University of Arizona 
Z — the 95% program editor, Steven R. 

Wood, Yale University ...1 

The why and wherefore of the Cornell 
Program Synthesizer, Tim Teitelbaum, 
Thomas Reps, Susan Horwitz, Cornell 
University ...8 

Syntax-directed editing of general data 
structures, Christopher W. Fraser, Univer- 
sity of Arizona ...17 

The implementation and experiences of a 
structure-oriented text editor, Ola 
Stromfors and Lennart Jonesjo, Linkoping 
University ...22 

The design of a language-directed editor 
for block-structured languages, Joseph 

M. Morris and Mayer D. Schwartz, Tek- 
tronix, Inc. ...28 

2:00 Session II. Chairman: Paul Abrahams, 
Independent Consulant 
Etude and the folklore of user interface 
design, Michael Good, MIT ...34 

The document editor: a support environ- 
ment for preparing technical docu- 
ments, Janet H. Walker, Bolt Beranek and 
Newman Inc. ...44 

Checking for spelling and typographical 
errors in computer-based text, Thomas 

N. Turba, Sperry Univac ...51 

Computer aids for writers, Lorinda Cherry, 

Bell Labs ...61 

Tuesday, June 9, 1981 

9:00 Session III. Chairman: Charles Geschke, 
Xerox PARC 

A generalized approach to document 
markup, Charles F. Goldfarb, IBM ...68 
PEN: a hierarchical document editor, Todd 
Allen, Robert Nix and Alan Perlis, Yale 
University ...74 


JANUS: an interactive system for docu- 
ment composition, D. D. Chamberlin, J. 
C. King, D. R. Slutz, S. J. P. Todd, B. 

W. Wade, IBM ...82 

PIC — a language for typesetting graphics, 
Brian W. Kernighan, Bell Labs ...92 

A graphics typesetting language, Christo- 
pher J. Van Wyk, Bell Labs ...99 

2:00 Session IV. Chairman: Brian Kernighan, 
Bell Labs 

Prettyprinting in an interactive program- 
ming environment, Martin Mikelsons, 
IBM ...108 

On the line breaking problem in text for- 
matting, James O. Achugbue, Michigan 
Technological University ...117 

3:45 Session V. Chairman: James King, IBM 
The real world, Robyn Shotwell, Shotwell and 
Associates 

TpX and METAFONT, Donald E. Knuth, 
Stanford University 

Wednesday, June 10, 1981 

8:30 Session VI. Chairman: Brian Reid, Stanford 
University 

A redisplay algorithm, James Gosling, 
Carnegie-Mellon University ...123 

The design of the PEN video editor display 
module, David R. Barach, David H. 
Taenzer, Robert E. Wells, Bolt Beranek and 
Newman Inc. ...130 

The implementation of Etude, an 

integrated and interactive document 
preparation system, M. Hammer, R. 

Ilson, T. Anderson, M. Good, L. Rosen- 
stein, B. Niamir, S. Schoichet, E. Gilbert, 
MIT ...137 

EMACS, the extensible, customizable self- 
documenting display editor, Richard M. 
Stallman, MIT ...147 

An annotated bibliography of background 
material on text manipulation, Brian K. 
Reid, Stanford University and David R. 
Hanson, University of Arizona ...157 



Z - The 95% Program Editor 


Steven R. Wood 

Department of Computer Science 
Yale University 

New Haven, Connecticut 06520 


Abstract 

Recently much attention has been focused on structure-oriented 
program editors that have specific knowledge about the syntax and 
semantics of a particular programming language [1,4, 5, 18]. These 
editors provide many desirable features for editing programs. 
However, the user interface is constrained by the syntax and semantics 
of the target language, and editing operations that are simple in a text 
editor can be quite complicated in a structure-oriented editor. In 
addition, the user has an editor that is limited to a single language and 
must use a different editor for text editing. Existing implementations of 
structure-oriented editors use a parse-tree representation for a program 
along with a supporting lexical analyzer, parser, and pretty-printer; this 
representation significantly complicates the implementation of an 
editor. 

We believe that the most natural representation of programs is text and 
that the editor should be able to take advantage of the same visual cues 
that programmers use to understand their programs. With a 
text-oriented model of program structure, the editor is both a program 
editor and a document editor. As a program editor it provides features 
to support many different programming languages, such as LISP, APL, 
PASCAL, and BLISS. As a document editor it provides basic 
word -processing functions such as text justification and spelling 
correction. A text orientation considerably simplifies the design of the 
editor and presents the user with a simple but powerful model of 
program structure. This paper describes a text-oriented display editor 
called Z. Z is the production editor in the Yale Computer Science 
Department. 

1. Introduction 

Z is a display editor that over three hundred people have used for more 
than a year. Its predecessors have been in use for more than nine years. 
While it contains many of the features found in other display text 
editors [11, 15, 17, 9], Z also includes features frequently found in 
structure-oriented program editors and word-processing systems. 

The Z editor was designed for a computing environment containing two 
DECsystem-20 computers running a local version of the TOPS-20 
monitor. Connected to the machines, through full-duplex lines, are 
over 150 unintelligent ASCII display terminals, all running at 9600 
baud. Experience has shown that each system will support 30 to 40 
users during the peak hours with acceptable response time. At any one 
time, typically half the users are using the editor. 


The author was supported by an NCSS Fellowship 


Permission to copy without fee all or part of this material is granted 
provided that the copies are not made or distributed for direct 
commercial advantage, the ACM copyright notice and the title of the 
publication and its date appear, and notice is given that copying is by 
permission of the Association for Computing Machinery. To copy 
otherwise, or to republish, requires a fee and/or specific permission. 

® 1981 ACM 0-89791-043-5/81/0600/0001 S00.75 


The user community at Yale consists of undergraduate and graduate 
students, faculty and secretarial staff. They have a long tradition of 
display text editing and expect high-speed response from a display 
editor. The users would not accept a new editor that did not provide 
more speed and more functions than existing editors. Z’s predecessors 
were simple text editors that did not provide any support for editing 
programs or documents. Our goal was to design an editor with the 
same high speed response of existing editors, while providing much 
more functionality in the area of program and document editing. 

Recently much attention has been focused on structure-oriented 
program editors that have specific knowledge about the syntax and 
semantics of a particular programming language [1, 4, 5, 18]. These 
editors provide many desirable features for editing programs, such as 
automatic pretty-printing based on the structure of the language; the 
ability to select complete syntactic units with one or two keystrokes; the 
ability to “zoom” out and suppress the lower level details of a program; 
and quick error detection with the cursor positioned at the location of 
the error. However, the user interface is constrained by the syntax and 
semantics of the target language, and editing operations that are simple 
in a text editor can be quite complicated in a structure-oriented editor. 
In addition, the user has an editor that is limited to a single language 
and must use a different editor for text editing. Existing 
implementations of structure-oriented editors use a parse-tree 
representation for a program along with a supporting lexical analyzer, 
parser, and pretty-printer; this representation significantly complicates 
the implementation of an editor. 

We believe that the most natural representation of programs is text and 
that the editor should be able to take advantage of the same visual cues 
that programmers use to understand their programs. The editor should 
impose no more structure on the text than required to implement each 
program-editing feature. With a text-oriented model of program 
structure, the editor is both a program editor and a document editor. 
As a program editor it provides features to support many different 
programming languages, such as LISP, APL, PASCAL, and BLISS. 
As a document editor it provides basic word-processing functions such 
as text justification and spelling correction. A text orientation 
considerably simplifies the design of the editor and presents the user 
with a simple but powerful model of program structure. 

The result is that Z is a very good text editor that supports many 
different programming languages as well. Experience has shown that Z 
is able to do 95% of what we would like a program editor to do without 
appreciably increasing the complexity of the editor. The over all design 
philosophy for the editor is discussed in the next section. Section 3 
presents the text-editing features of the editor. The program-editing 
features are discussed in section 4. 


1 



i 


2. Design Philosophy 

The design of the users view of our editor is based largely on our 
accumulated experience with general-purpose text editors that use a 
display terminal. This section presents our view of what the important 
issues are in a good display-based editor and the approach we used in 
the design of Z. 

Many of the issues discussed in this section, such as argument selection 
and visual feedback, are discussed from the standpoint of the local 
computing environment. Specifically not mentioned is what type of 
editing features would be important on a personal computer with a 
bit-map display and a “mouse" pointing device [12], While such an 
environment may be an admirable goal, most of the real world in 
computing is still discovering the “glass” teletype. 

2.1. File Model 

Z presents each file as a quarter plane of text that extends “infinitely" in 
width and length. The origin of the plane is the first character of the 
first line of the file. One does not see a file as a stream of characters that 
include newline and tab as editable characters. Rather one sees a file as 
an infinite array of infinitely wide lines. 

There are commands for positioning the display window anywhere in 
the plane and for positioning the cursor anywhere within the display. 
The cursor is the focus of each editing operation. An editing operation 
may be as simple as typing new characters over old ones or as complex 
as reformatting a paragraph of text. What the user sees on the screen at 
any moment is precisely what is present in the corresponding section of 
the file. Extending a file is a simple matter of typing past the last line of 
the file. The same is true for adding text past the end of a line. This is in 
direct contrast to stream display editors, such as MIT EMACS[I7], 
which maps the file onto the display by showing continuation marks 
and wrapping long lines onto succeeding lines. Stream editors don't 
allow the cursor to move beyond the end of a line or the end of the file. 
To add text past these boundaries, the user must insert enough spaces, 
tabs or newlines to get to where the new text is to be added. This is why 
most stream editors require that newlines and tabs be editable 
characters that modify the file. The result is a more restrictive view of 
the file than the quarter plane model. 

Associated with the file model is what happens when the user types a 
graphic character. Most stream editors use “insert mode”, where 
graphic characters are inserted at the current cursor position, as the 
default so that newline and tab will work correctly when treated as 
graphic characters. The default mode in Z is for graphic characters to 
overstrike or replace what is in the file at the current cursor location. 
The paradigm in Z for inserting text in a file is to move the cursor to the 
location where the insertion is to take place, insert one or more blank 
spaces or lines (usually with a cursor argument) and then type the new 
text over the white space. 

Z does have an “insert mode", but this mode is transitory and is turned 
off as soon as a non-graphic editor command is entered. Insert mode is 
implicitly enabled by the char Delete and word Delete commands, on the 
assumption that the character or word deleted will be replaced. 

2.2. Command Syntax 

Our experience has shown that command syntax and argument 
selection are important issues in determining how easy an editor is to 
use. After many years of user feedback, the Z editor and its 
predecessors have developed a powerful syntax for specifying 
commands and arguments. Much of the speed and functionality of the 
editor are a result of this syntax. 

Editor commands are assigned to control characters that allow all 
editing tasks to be expressed in a concise yet natural way. Learning to 
use the editor is analogous to learning to touch-type on a regular 
typewriter. Except for the hardware cursor keys, little use is made of 
special terminal function keys, since they require the user to move his 


hand away from the typewriter keyboard. The limitation of 32 control 
characters that are available on an ASCII terminal is overcome by a 
software shift command that defines an alternative keyboard containing 
an additional 127 command characters. 

Editor commands may be preceded by a rich set of argument formats, 
although all commands have a default action when invoked with no 
argument. The default action is the most common usage of each 
command (e.g. /Lines scrolls the display forward by seven lines). An 
argument is entered with the arg editor command and is terminated by 
any editor command that accepts an argument, including arg. which is 
useful for entering more than one argument to an editor command. The 
arg command displays a mark at the current cursor location to indicate 
the beginning of the argument. 

The null argument is usually assigned the second most common variant 
of the command (e.g. argfLines scrolls the current line to the top of the 
display). A text argument is one in which only graphic characters are 
typed between the arg command and the terminating editor command. 
A cursor argument is one in which only cursor movement commands 
are used between the arg command and the terminating command. 

Cursor arguments are an important feature of the Z editor. They allow 
the user to select quickly areas of text to be deleted, moved, and 
manipulated. With the support of a good display terminal, the current 
selection can be highlighted as the cursor is moved. There are two types 
of cursor arguments: 

• A box argument selects a rectangle of text. The starting 
location of the cursor specifies one corner of the rectangle and 
the ending location specifies the opposite corner (a line 
argument is nothing but a box argument, where the width of 
the rectangle is zero). Box arguments are useful for moving 
blocks of text to the left or right and for manipulating 
columns of text. 

• A stream argument selects a stream of text from the starting 
location of the cursor to the ending location, preserving any 
intervening logical line breaks. Stream arguments are useful 
for joining and splitting up lines of text. 

Each editor command decides whether to interpret a cursor argument 
as a line/ box argument or a stream argument. 

A fundamental idea in the Z editor is the overloading of each editor 
command. This is a result of the rich set of argument formats and the 
meta and shift commands. Meta is a prefix command that causes a 
minor modification in the execution of the next editor command. Shift 
is the software shift command that reads the next character literally and 
executes the editor command associated with that key on the alternate 
keyboard. Th ef Search command is an example of this overloading: 

•fSearch searches forward, but reuses the most recent fSearch 
argument. 

• arg fSearch searches forward for the next occurrence of the 
blank terminated word that begins at the current cursor 
position. 

• arg STRING fSearch searches forward for the next 
occurrence of STRING and if found moves the cursor there. 

• arg arg STRING fSearch interprets STRING as an arbitrary 
regular expression and compiles and executes a machine code 
fragment to perform the search. 

•meta... fSearch performs the search while ignoring case 
distinctions. 

• shift fSearch performs a forward search for the next 
misspelled word. 

These rules may seem complicated, but they make it easy for users to 
learn the basic commands and then expand their expertise at their own 
pace. Editor commands with similar functions have similar overloading 
rules. For example, the rules above are also applied to the other 
commands that take search strings. 


2 


ng 


•S, 

10 

•h 

»n 

>y 

is 

ie 

te 

nt 

ne 

re 

J. 

Is 


w 

id 

it 

:s 


it 


)r 

:e 

a 

ft 

i 

e 


i 

i 


i 


2.3. t/ser Control 

It is very important that the user feel in control of the editor and not the 
other way around. The most important manifestation of this 
philosophy is the ability to undo the effects of an editor command. A 
user will be understandably upset if he is unable to recover from 
accidently doing a global replace of all blanks with the null string. Less 
seriously, there is the case where the user presses the bSearch command 
instead of the delete command and finds himself no longer positioned 
where he wanted to do the delete. Both these cases are handled by the 
undo command that undoes the effect that one or more preceding editor 
commands had on the current file and the position of the display 
window and the cursor. For text changes made by typing over old 
characters, the undo command undoes the changes a line at a time. 

When the user executes a command by mistake, he usually recognizes 
this immediately and wants to tell the editor to cancel the current 
command. Most commands will finish before the user realizes his 
mistake and the undo command can be used to recover the previous 
state. For commands that can take a significant amount of time, such 
as changing files, searching for a string, entering an argument, the user 
should be able to cancel the command without having to wait for it to 
complete. The cancel command will cancel the current editor command 
cleanly. If there is no current command then cancel will halt the editor 
without saving the current file. 

The user has the ability to tailor certain functions of the editor to his 
particular preference. When the editor is first started, it looks for a 
profile file in the user’s directory. This file contains switch settings that 
modify the default behavior of the editor. For example, the user can 
control the assignment of editor commands to keyboard characters, the 
default argument to an editor command, default search modes, and 
software tab settings. The list includes about 25 separate quantities and 
is the result of several iterations between the users and the 
implementors. Certain profile switches may also be included in 
comment strings at the beginning of a file. The switches are only active 
while within the file, and the normal settings are restored when the user 
switches to another file. 

2.4. Visual Feedback 

It is important that a display editor keep the user informed of what it is 
doing. The Z editor attempts to make each command appear 
instantaneous (the time between a user keystroke and visual feedback is 
usually less than one second). For graphic characters the visual 
feedback is just the character itself, displayed at the current cursor 
position. Z takes advantage of the full-duplex character-echoing logic 
provided by the TOPS-20 monitor [20] and lets the operating system 
echo graphic characters and hardware cursor keys. Not only does this 
provide instantaneous response to user keystrokes, but it also reduces 
the load on the system because the editor process does not wake up until 
a non-graphic character is typed. 

Editor commands do not directly update the display to indicate changes 
made to the file. Instead, after each editor command, a display 
processor compares what it last wrote to the display with the current file 
representation, to determine what has changed. The comparison 
operation is cheap since it uses pointer equality to determine what has 
changed or moved. The display processor can be preempted by certain 
editor commands that have a high probability of changing the display. 
The benefit of this approach is that it separates the display logic from 
the rest of the editor, while providing the user with visual feedback after 
each command. This approach was inspired by the MIT EMACS 
implementation [ 1 7]. 

In editing a large file, certain operations can take a significant amount 
of time to complete (e.g. reading and writing a file, search for a string). 
If this happens, the user may begin to wonder what the editor is doing. 
To avoid this problem, these commands maintain a line counter in the 
lower corner of the display, that keeps the user informed of the progress 


of the editor command. The line counter is updated every one hundred 
lines and is displayed only if the size of the current file is greater than a 
default amount. 

2.5. Performance versus Extensibility 
In implementing Z, the emphasis was on fast response to user 
keystrokes. Local users have had long experience with 
high-pqrformance display editors and would not accept an editor that 
was less responsive than previous editors. We felt that the advantages 
of a user extensible editor implemented in LISP were clear [10, 7, 8, 21], 
given the success of LISP as an extensible programming language. The 
available TOPS-20 LISP implementations, however, did not provide 
the facilities for implementing a high-performance display editor. Since 
the goal was to implement an editor, not an efficient implementation of 
LISP, Z was implemented in the systems programming language 
BLISS [23], for which there existed a very good locally developed 
programming environment [22]. 

Rather than develop our own specialized macro or programming 
language to support user extensions, we felt it was better not to provide 
the feature at all. We decided to solicit user suggestions for 
improvement and respond to them as rapidly as possible. As a result, 
many of the existing features in Z are extensions suggested by users and 
were not included in the initial design. The suggestions were iterative 
and came about after substantial use of the editor. This would not have 
happened if the performance of the editor had been too poor to capture 
the interest of the user community. 

3. Text Editing Features 

In editing text, the editor “knows” about words, sentences, and 
paragraphs. Words and sentences are delimited by a set of punctuation 
characters specified by the user. Paragraphs are delimited by: 

• a blank line. This is the normal case. 

• a change in the prevailing left margin. This handles the case of 
labeled paragraphs, such as this one. 

• a document compiler command, such as for SCRIBE [13]. 

This information is cheaply computed on the fly whenever necessary 
and is not part of the internal representation of the file. 

3.1. Movement Commands 

The display window is the context for most editor commands. Display 
movement commands allow the window to be positioned anywhere in 
the quarter plane. The most recent seven window positions are 
remembered so that the user can flip back to a previous context without 
losing his current context. There are basic commands to move the 
window forward or backward by lines and pages and left or right by 
columns. The / Search and bSearch commands described previously, 
accept a search string and move the cursor forward or backward to the 
next occurrence of the search string. The mark command accepts either 
a numeric line number or a bookmark name and moves the cursor to 
that location. The mark command is also used to define bookmarks at 
the current cursor location. If a bookmark is in another file then the 
editor will automatically switch to that file. 

The cursor is the focus for almost all editor commands. The cursor 
movement commands allow the cursor to be moved anywhere within 
the current display window. The simple commands that move forward 
and backward by characters and up and down by lines are assigned to 
the cursor keys found on most ASCII display terminals. There are 
additional commands that move forward and backward by words, 
sentences or paragraphs. The cursor movement commands do not take 
arguments and thus can be used within a cursor argument to select text. 
If the cursor attempts to move beyond a display boundary, the cursor 
will either do nothing, wrap around, or cause the display window to 
scroll in the direction the cursor was attempting to move. The user 
profile specifies which of these actions is to occur for each of the four 
display boundaries. 


3 





3.2. Text Editing Commands 

There are two different delete commands. Delete interprets its 
argument as a line/box cursor argument and s Delete interprets its 
argument as a stream cursor argument. Their default arguments are the 
current line and the current character respectively. Both move the 
deleted text into a “pick" buffer, whose name is the same as the 
command. The put command inserts a pick buffer at the current cursor 
location. Thus to move text from one location to another, the user first 
deletes the text, moves to the new location and then uses the put 
command to insert the text just deleted. The pick command copies text 
into a pick buffer without deleting anything. Thus the pick and put 
commands can be used to copy text from one location to another. 
Commands that modify a pick buffer can be preceded by meta. This 
causes the selected text to be appended to the pick buffer rather than 
replacing its contents. Meta pick is useful for collecting separate pieces 
of text. Commands that modify a pick buffer have a default pick buffer 
that they use. The put command, by default, uses the most recently 
modified pick buffer. These defaults may be overridden by preceding a 
command with the name of the pick buffer to use, in addition to any 
text selection argument. This allows a user to define multiple pick 
buffers for saving text. Pick buffers may also be defined in the user 
profile file; this is useful for defining templates for commonly used 
language constructs. 

There are also two insert commands, insert and slnsert , that work in an 
analogous manner. Instead of deleting text, they insert white space at 
the current cursor location. The put command is a special case of an 
insert command that inserts the contents of the most recently modified 
pick buffer. Preceding any of the inserting commands with meta causes 
text that would have been inserted to overwrite text at the current 
cursor location. 

3.3. Word Processing Commands 

Designing a complete on-line document editor is a formidable task that 
was outside of the scope of the Z editor. There are, however, several 
commands that prove useful, even when using a document compiler to 
format a document. Th e/ill and justify commands fill or justify one or 
more paragraphs of text. The default is to format the current 
paragraph, with the current cursor location marking the left margin and 
the right-hand side of the display the right margin. Both these 
commands take a box argument that overrides the default margins. A 
line argument just overrides the default length. The formatting 
algorithm is simple, working on blank separated words, and does not 
include hyphenation. Text to the right of the right margin is moved so 
that it falls within the right margin. Text to the left of the left margin is 
left undisturbed, so that the commands can be used on a labeled 
paragraph or a program comment. If either command is preceded by 
meta, each sentence begins on a new line. This is useful for exploding a 
paragraph into individual sentences for easy editing. The flushLeft, 
flushRight, and center commands are useful for aligning text to a 
specified margin. This set of commands makes it easy to create short 
documents without resorting to the expense of a document compiler. If 
the user is preparing a large document then they allow him to clean up 
the text so that he can proofread the document in the editor before 
invoking the document compiler. 

The spell command advances the cursor to the next misspelled word. 
When first invoked, the editor loads Webster’s Second Edition 
dictionary, containing approximately 40,000 words. The dictionary is 
stored as a hash table containing 800,000 entries, where each entry is a 
single bit. To look up a word in the dictionary, the editor computes ten 
different hash functions in parallel. If all ten probes of the hash table 
return a one bit then the word is considered found. The hash functions 
are computed on the root word, after prefix and suffix stripping. If the 
word is not found in the dictionary then it is considered misspelled. 
This algorithm is probabilistic and was suggested in an article by Carter 
et al. [3]. 


If a word is misspelled, the user has the option of ignoring it, adding it 
to the dictionary, or correcting it. The correction algorithm tries 
obvious corrections by deleting each letter, transposing each adjacent 
pair of letters, assuming each letter is wrong, and assuming there is one 
missing letter. This requires fewer than 54 n probes of the hash table, 
where n is the length of the word. Given the approximate nature of the 
look-up algorithm, the correction algorithm is not used for words 
smaller than four letters. The correction algorithm quickly comes up 
with the right correction most of the time. When it does guess wrong, 
the user just presses cancel and makes the correction himself. 

4. Program Editing Features 

As mentioned in the introduction, existing structure-oriented program 
editors have several disadvantages, such as increased complexity in the 
implementation, a restrictive user interface, and poor support for text 
editing. A partial solution proposed by Brown and Wood [2] allows the 
user to manipulate a program as text. The parse tree is mapped onto 
the text representation and used to implement structured selection and 
incremental compilation. Because it must maintain two separate views 
of a program, this solution is even more complicated than other 
program editors and does not address the problem of supporting 
multiple languages. The complexity issue is of real importance because 
it drastically effects the development cycle of an editor. 

With a text-oriented model of program structure, the design of the 
editor is much simpler. For each line of text, the editor knows only 
about quoted strings, an end-of-line comment, blank separated words, 
tab/ backtab tokens, and balance tokens. A simple table-driven lexer 
takes a line of text and divides it into these categories. Each editor 
command is responsible for using this information to impose any 
additional structure, beyond the text representation, that is necessary to 
support the program-editing features. 

This section discusses the program-editing features of Z and how they 
are implemented with a text-oriented model of program structure. For 
each language type there is a table, modifiable by the user, that 
categorizes the tokens for the language. Currently, the editor contains 
tables for LISP, BLISS, PASCAL, RATFOR, and APL. If a file is not 
one of these language types, then it is assumed to contain text, with the 
semantics described in the previous section. APL mode works only on 
display terminals that support the APL character set. 

4.1. Automatic Indentation 

Existing structure-oriented program editors store a program as a parse 
tree and use a pretty-printer to map the parse tree onto the display. 
Instead of having the editor play the dominant role in program 
formatting, we chose to have it “suggest” an indentation amount 
whenever the newline command is used to enter a line. The indentation 
is relative to the first non-blank character of the current line. For 
block-structured languages, the cursor position on the next line is 
determined as follows: 

• For each language type there is a table of tab and backtab 
tokens. When the newline command is invoked, the editor 
examines the last token on the current line and checks to see 
whether it is in either table. If it is, either a tab or backTab 
command is performed as indicated. This handles tokens that 
open and close blocks respectively, such as “begin" and “end." 

• Otherwise it looks at the last token of the previous line and 
checks to see whether it is a tab token that implicitly opens a 
block. If it is, a backTab command is performed. This 
handles the case of a “WHILE.. .DO” statement with only one 
statement within the loop. 

• Otherwise it places the cursor in the same column as the first 
non-blank character of the current line. This handles lists of 
statements, field names, and so on. 

For BLISS, the backtab token set is: TES, TESN, END, left 
parenthesis and the tab token set is: SET, NSET, BEGIN, right 
parenthesis, THEN, ELSE, DO with the last three tokens implicitly 


4 


ng it 

* ~>s 

..e 
ible, 
the 
jrds 
s up 

™g, 


ram 
the 
text 
; the 
>nto 
and 
ews 
her 
ing 
use 


the 
nly 
rds, 
xer 
tor 
my 
/ to 

ney 

7 or 

hat 

ins 

"*t 


rse 

ay. 

am 

int 

on 

: or 

is 


eft 

ht 

•iy 


opening a block since they have no matching token in the backtab token beginning of the line to the first non-blank character of the 

table. Combined with the rules above, they encourage the following line. 


indentation style: 

FUNCTION ... BEGIN 
IF'... THEN ( 

.... 

ELSE 

WHILE... DO 

END; 

The advantages of this scheme are that it is simple (the implementation 
is only 50 lines of BLISS) and that it is right almost all of the time. 
Even when this algorithm guesses wrong, it is off by only one tab stop. 
By disabling the automatic indentation feature, the user can effect his 
own indentation style by hand. 


• The display level must be an integral number of tab stops and 
differ from the display level of the preceding line by plus or 
minus one tab stop. If this is not true then the display level is 
that of the preceding line. This takes care of continuation 
lines for long expressions, long parameter lists, and so on. 

The display level can easily be computed for each line as a function of 
the number of leading spaces and the current tab settings. 

The zoom command specifies the maximum level to display. If it is 
infinity then all lines are displayed. If it is zero then only the top level 
declarations and procedure definitions are displayed. Groups of lines 
that are not displayed appear as a single line containing as a 
placeholder. The cursor movement keys will select this line just like any 
other, except that all the lines indicated by the placeholder are selected. 


For languages that are not block-structured (e.g. LISP and APL) the 
algorithm is even simpler. For LISP, the algorithm places the cursor on 
the next line, in the column containing the left parenthesis of the last 
balanced expression, as determined by moving backwards from the end 
of the current line. This encourages the following indentation style: 

(DE FOO (BAR) 

(CONDfATOM BAR) 

(...) 

(T (...))) 

(-.)) 

For APL the indentation algorithm is the same one used for text files, 
namely maintain the indentation of the current line. 

4.2. Balanced Expressions 

A balanced expression is any sequence of tokens, bracketed by a pair of 
unique balance tokens, such as open and close parentheses. The 
expression may span more than one line. For each language type there 
is ? table of the open and close balance tokens. This approach handles 
parenthesized expressions in any language, the LISP S-expression, the 
“begin... end” blocks found in block structured programming languages, 
and the document compiler SCRIBE [13]. 

There are two things a user wants to do with balanced expressions: 

• During typein he should be able to request that the editor 
close off the most recent open parenthesis, begin block, etc.. If 
it is already closed off then the editor should indicate the 
location of the matching token. 

• When moving the cursor he should be able to move over 
balanced expressions as a single unit. 

The Z editor provides both these features. The balance command 
searches backwards for the next unmatched balance token and modifies 
the file at the current cursor location with the matching balance token. 
The fBalExpr and bBalExpr commands move forward or backward 
over balanced expressions, and are useful within a cursor argument for 
selecting an entire expression with one or two keystrokes. 

4.3. Structure Commands 

One feature stressed by program editors is the ability to select complete 
syntactic units with a single keystroke. A related feature is the ability to 
zoom out and suppress lower levels of detail (e.g. display only the top 
level procedure headers). Both these features would seem to require the 
existence of a parse tree. The Z editor, however, is able to provide these 
features by making a very simple assumption about how a programmer 
formats his program. 

A programmer maintains an indentation convention because it helps 
him visualize the structure of his program. Thus all the important 
information about the block structure of a program is contained in the 
indentation, provided the programmer is consistent. In particular, the 
indentation style described earlier provides this sort of consistent 
indentation. The indentation is interpreted in one of the following 
ways: 

• The display level of a line is the number of tab stops from the 


The cursor movement keys that move forward or backward over 
paragraphs now move forward or backward to the next block of lines 
with a display level of zero (i.e. those that begin at the left margin). This 
lets the user move by top-level S-expressions in LISP and top-level 
procedure definitions in block structured languages. 

The cursor movement keys that move forward or backward over 
sentences now move forward or backward to the next line that has a 
display level less than or equal to that of the current line. This lets the 
user move by statements within a block. 

These features, combined with the balanced expression cursor 
movement keys, provide for the selection of major syntactic units with 
just a few keystrokes. It is important to note that the editor uses the 
same simple visual cues that a programmer uses to understand his 
program, and does not require him to model the complex semantics of a 
parse tree representation of a program. 

4.4. Program Compilation 

A frequent complaint with many programming environments is the 
large amount of time spent flipping back and forth between the editor 
and the compiler while debugging the syntax errors in a program. The 
time spent watching the compiler is unproductive, and it is often 
necessary to make a list of line numbers and error messages manually, 
before reentering the editor to fix the errors. In the past, local editors 
have tackled this problem by supporting multiple display windows and 
having a special entry procedure that displays the source file at the top 
of the display and the error file in a small window at the bottom. This 
solution still requires the user to translate error line numbers into editor 
command sequences that will position the cursor near the point of the 
error. 

Structure-oriented program editors enforce syntactic correctness by 
disallowing the creation of invalid programs either through the use of 
templates [18, 16] or by doing incremental compilation at the statement 
or procedure level [2], Both methods allow the editor to move the 
cursor directly to the source of the error when it is discovered. The 
benefits hardly seem worth the expense of integrating the compiler into 
the editor. 

We feel that the programmer is the person best able to decide when his 
program is in a state ready for compilation. Existing compilers are 
perfectly able to locate errors. The problem lies in the lack of 
communication between compilers and editors. We sought to improve 
this communication to allow the editor to provide the functionality of a 
program editor, while at the same time supporting many different 
languages, rather than just one. 

The compile command accepts the name of a file to compile, which 
defaults to the current file. The compilation request is processed 
asynchronously, allowing the user to continue editing. When the 
compilation completes, a message is displayed at the bottom of the 


5 




display. The user can either continue editing or use the compile 
command to display any error messages from the compilation. Each 
message is displayed at the bottom of the display, after moving the 
cursor to the location of the error, switching files if necessary. 

The editor communicates with an asynchronous child process that 
places the compilation request on a queue. The child process sends a 
message to the editor when the compilation is done. Any error 
messages that occurred during the compilation will have been appended 
to a message file by the child process. The file contains one line for each 
error message, consisting of the name of the file containing the error, 
the line and column number of the error, and an error message. 

The child process does all the work by maintaining a queue of 
compilation requests, determining which compiler to invoke based on 
the file type, formatting the compiler command and analyzing the 
output of the compiler, translating any error messages into the format 
described above. The child process controls one compilation at a time 
with the compiler running in yet another process. The compiler process 
communicates with the child process via a pipe [14]. The 
language-dependent information consists of the name of the compiler, a 
routine to format the compiler command, and a routine to analyze the 
output of the compiler a line at a time. The output routine is the most 
difficult since it must be able to deal with nested source files, warning 
messages, etc.. To make this feasible, local modifications were made to 
several compilers to provide adequate information to the output 
routine. 

The compile command is one of the more popular commands of the 
editor since it eliminates a boring task that most people consider 
inappropriate in this day and age of computer-assisted programming. 
Currently the child process under the editor knows about the following 
compilers: LISP, BUSS, PASCAL, RATFOR, FORTRAN 
MACRO, SCRIBE, and RUNOFF. The last two are document 
compilers. 

4.5. Interfacing to the External World 

Whether preparing a document or writing a program, the user spends a 
large amount of time in the editor. It is important that an editor be able 
to communicate with other components of the computing environment. 
The ex it command of the editor provides a clean method for 
communicating with the environment. This command uses a simple 
message passing scheme that allows an arbitrary string of commands to 
be passed to the process (fork) controlling the editor. In our local 
computing environment, this fork is usually a program called MUF 
(Multiple User Forks) [6]. 

MUF is a program that provides simple facilities for managing one or 
more child forks. Each child fork under MUF is associated with a 
single control character. Typically one fork is running interactively at 
any one time, with the other forks either suspended or running 
asynchronously with their primary input and output redirected to a file. 

The user switches from one fork to another merely by typing the control 
character assigned to the destination fork. This causes the current fork 
to be suspended and the destination fork to be created or continued. 

MUF allows several different contexts to be maintained. There is 
usually an editor fork, a TOPS-20 command interpreter fork, a fork for 
a language interpreter, and several on-line documentation forks. The 
last are nothing more than saved versions of the Z editor with large 
documentation files preloaded and bookmarks set for every system call 
name, runtime routine name, editor command name, and so on. The 
following scenarios give a flavor for the interaction between the editor 
and MUF. 

Suppose the user is in the editor under MUF and the T fork has the 
on-line documentation for the BLISS runtime library. A short 
sequence of editor commands will display the documentation for a 
particular routine (e.g. argT exit arg RoutineName mark). The exit 


command is executed by the editor fork, and passes the T to MUF 
MUF invokes the T fork and the T fork processes the mark command 
to position the display window at the documentation for RoutineName 
Exiting the T fork causes the editor fork to be resumed. 

Suppose the editor fork is A and the L fork is running the LISP 
interpreter. The user has just discovered a bug in one of his functions 
Typing control-A will switch him to the editor fork, which will display 
the last file edited. If that is the file containing the function in error then 
he makes the correction, selects the function with the cursor movement 
keys, and exits the editor. The editor tells MUF to start the LISP fork 
with Its primary input redirected to the selected text. The function in 
error has now been redefined and the user can proceed with debugging 
All this is done with a minimum of keystrokes, while maintaining the 
state of both forks. 6 

These scenarios demonstrate the effectiveness of such a simple 
communication protocol when combined with a program like MUF. 

While MUF allows one to maintain the state of several forks at once it 
does not allow one to maintain the state of the display output of those 
forks. This led to the development of the session manager a 
one-window version of the INTERLISP programmer’s assistant [19], 
ny hne-oriented program may be placed in the window. There is only 
one window because of the small screen size of our terminals, and 
because the facilities of TOPS-20 make such windows fairly expensive, 
n the session manager, the editor has two modes. In edit mode the 
user may give any normal Z editing commands, such as scrolling or 
looking at other files. The only difference is that the exit command 
switches the editor to SM mode. In SM mode, every keystroke the user 
types ,s given as input to the program, and any program output is 
displayed on the screen and simultaneously recorded in a typescript file. 

!h tooToo’ th ‘ Pr ° 8ram bChaVeS CXaCtly as 11 does when ru " ™der 
the TOPS-20 command interpreter, except that when the user types “up 

cursor the session manager suspends the program and switches the 
editor back to edit mode with the typescript file as the current file This 
file can be edited like any other file and the exit command takes a cursor 
argument as before, except that the argument is given as input to the 
program. Visual fidelity between the display and the typescript file is 
always maintained, i.e. switching from SM mode to edit mode does not 
require any rewriting of the screen. The session manager remembers the 
location of the most recent 25 input lines read by the program, which 
makes it very easy to move up into the typescript file, make a correction 
o an input line and then give the corrected text back to the program. 

The session manager is used quite heavily by users of interpreted 
languages, because it provides a history of interactions with the 
interpreter. A typical mode of operation is illustrated by this process 

MUF as top fork 

on-line documentation forks 
any other display forks 

Z editor fork 

MUF fork under session manager 
TOPS-20 command interpreter 
one or more language interpreters 

5. Conclusion 

Z is a high performance program editor that combines speed and 
functionality with a simple text model of the file. Z is able to provide 
the same level of functionality as existing structure-oriented program 
editors, except that Z supports many different programming languages 

instead of just one. At the same time, Z is a very good text editor in its 
own right. 


6 


'UF. 

id 

-me. 


LISP 
-ions, 
splay 
then 
ment 
fork 
m in 
ging. 
g the 


mple 


:e, it 
hose 
r, a 
[19]. 
only 
and 
sive. 

. the 
g or 
and 
user 
it is 
file. 

ider 

‘up 

the 

This 


e is 
not 
the 
lich 
:ion 


•ted 

the 

:ess 


nd 

ide 

am 

ges 

its 


r 


Acknowledgments 

Z is only one of many software tools developed by Yale graduate 
students in the last few years. These tools were all developed using the 
Yale BLISS programming environment [22] for TOPS-20; their 
implementation would have taken much longer in most other 
environments. Z owes much to previous Yale editor implementations, 
namely A, C, D, E, F and EE. John Ellis conceived and implemented 
the session manager. Bob Nix suggested the hashing algorithm used by 
the spell command and implemented the regular expression compiler 
used by the search routines. Nat Mishkin found and fixed many bugs 
and patiently helped many novice users. The large user community at 
Yale, mostly not system programmers, was extremely helpful. Because 
of their suggestions and criticisms, Z quickly evolved into a useful tool. 


j. 

i 


F 


References 

[1] Alberga, C. N., A. L. Brown, G. B. Leeman Jr., M. Mikelsons, 
and M. N. Wegman. 

A Program Development Tool. 

Technical Report RC 7859, IBM Thomas J. Watson Research 
Center, Yorktown Heights, N.Y., September, 1979. 

[2] Brown, M. R., and S. R. Wood. 

A Dislay-oriented Program Editor. 

Extended Abstract, Yale University, New Haven, Ct., January, 
1979. 

[3] Carter, L., R. Floyd, J. Gill, G. Markowsky, and M. N. 
Wegman. 

Exact and Approximate Membership Testers. 

In Proceedings of the 10th Annual A CM Symposium on Theory 
of Computing , pages 59-65. ACM, May, 1978. 

[4] Donzeau-Gouge, V., G. Huet, G. Kahn, B. Lang, and J. J. Levy. 
A Structure Oriented Program Editor: A first step towards 

computer assisted programming. 

Technical Report 114, IRIA-LABORIA, France, April, 1975. 

[5] Donzeau-Gouge, V., G. Huet, G. Kahn, and B. Lang. 
Programming environments based on structured editors: the 

MENTOR experience. 

Technical Report, INRIA, France, May, 1980. 

[6] Ellis, J. R. 

MUF: Multiple User Forks. 

Technical Report 191, Yale University Computer Science 
Department, January, 1981. 

[7] EM ACS Text Editor User's Guide , Order #CH27. 

Honeywell Information Systems Inc., 1979. 

[8] MULTICS EM A CS Extension Writers' Guide, Order #052. 
Honeywell Information Systems Inc., 1980. 

[9] Engelbart, D. C., and W. K. English. 

A Research Center for Augmenting Human Intellect. 

In Proceedings of the 1968 FJCC, pages 395-410. AFIPS 
Conference Proceedings, Montvale N.J., December, 1968. 

[10] Greenberg, B. S. 

Prose and CONS (Multics Emacs: a commercial text-processing 
system in Lisp). 

In Conference Record of the 1980 LISP Conference, pages 6-12. 
Stanford University, August, 1980. 

[11] Irons, E. T., and F. M. Djorup. 

A CRT Editing System. 

Communications of the ACM 15(1): 1 6-20, January, 1972. 


[12] Lampson, B. 

Alto User’s Handbook: Bravo Manual. 

Xerox Palo Alto Research Center, Palo Alto, Ca. 

September, 1979. 

[13] Reid, B. K. 

A High-Level Approach to Computer Document Formatting. 
In Seventh Annual A CM Symposium On Principals of 
Programming Languages, pages 24-3 1 . ACM, January, 
1980. 

[14] Ritchie, D. M., and K. Thompson. 

The UNIX Time-Sharing System. 

Communications of the ACM I7(7):365-375, July, 1974. 

[15] Samuel, A. 

Essential E. 

Technical Report STAN-CS-80-796, Stanford University, 
March, 1980. 

[16] Shapiro, E., G. Collins, L. Johnson, and J. Ruttenberg. 
PASES: A Programming Environment for PASCAL, 
presented at the Schlumberger Programming Environment 

Workshop. 

April, 1980. 

[17] Stallman, R. M. 

EM A CS, The Extensible, Customizable, Self- Documenting 
Display Editor. 

Technical Report AI Memo 519, Massachusetts Institute of 
Technology, June, 1979. 

[18] Teitelbaum, T. 

The Cornell Program Synthesizer: a Microcomputer 
Implementation of PLCS. 

Technical Report TR79-370, Dept, of Computer Science, 
Cornell University, June, 1979. 

[19] Teitelman, W. 

A Display Oriented Programmer's Assistant. 

Technical Report SSL-79-9, Xerox Palo Alto Research Center, 
Palo Alto, Ca., March, 1977. 

[20] TOPS-20 Monitor Calls Reference Guide, Order 
#AA-4166-TM. 

Digital Equipment Corporation, Marlboro Mass., 1980. 

[21] Weinreb, Daniel L. 

A Real-Time Display Oriented Editor for the Lisp Machine. 
Undergraduate Thesis, MIT Department of EE & CS, January, 
1979. 

[22] Wood, Steven R., and John R. Ellis. 

A BLISS Programming Environment. 

Technical Report, Yale University Computer Science 
Department, April, 1981. 

[23] Wulf, W. A., D. B. Russel, and A. N. Haberman. 

BLISS: A Language for Systems Programming. 
Communications of the ACM 14(1 2):780-790, December, 1971. 


1 

t 

| 


7 



i 


The Why and Wherefore of the Cornell Prograa Synthes ixer 

Tim Teitelbaum, Thomas Reps, Susan Horwitz 


Department of Computer Science 
Cornell University 
Ithaca, NY 14853 


Abstract 

The Cornell Program Synthesizer is a syntax- 
directed programming environment that has been used 
in introductory programming courses since June, 
1979. We present our experience with the Syn- 
thesizer by introducing its main features, by 
presenting our basic principles of design, and by 
discussing important design decisions. 

1. Introduction 

The Cornell Program Synthesizer is an interac- 
tive programming environment with integrated facil- 
ities for creating, editing, executing, and debug- 
ging programs. The Synthesizer stimulates program 
conception at a high level of abstraction, promotes 
programming by step-wise refinement, and spares the 
user from the frustrations associated with syntac- 
tic detail. 

The design and implementation of the Syn- 
thesizer began in May, 1978. Prototype versions 
were operational under PDP-11 UNIX as well as on 
Terak (LSI-11) microcomputers in December [7,8,9]. 
Versions for the PDT-11, MiniMinc, and Berkeley VAX 
UNIX have since been implemented. 

The Synthesizer was first used in classes at 
Cornell in June, 1979. The success of this initial 
trial led to greatly increased usage: the Syn- 
thesizer is currently used at Cornell, Rutgers, 
Princeton, and Hamilton College by a total of about 
3000 students per year. 

The first language implemented for the Syn- 
thesizer was PL/CS, an instructional dialect of 
PL/I [2]. All program examples in this paper are 
given in PL/CS. Versions to support other program- 
ming languages are being implemented. 

Previous interactive syntax-directed program- 
ming environments related to our work include EMILY 
[6], MENTOR [4], and CAPS [11], 

2. Introduction to the Synthesizer 

The cornerstone of the Synthesizer is its 
syntax-directed editor; entry and modification of 

Permission to copy without fee all or part of this material is granted 
provided that the copies are not made or distributed for direct 
commercial advantage, the ACM copyright notice and the title of the 
publication and its date appear, and notice is given that copying is by 
permission of the Association for Computing Machinery. To copy 
otherwise, or to republish, requires a fee and/or specific permission. 

0 1981 ACM 0-89791-043-5/81/0600/0008 S00.75 


program text are guided by a grammar for the host 
programming language. The incorporation of the 
grammar into the editor guarantees syntactically 
correct programs; there is no need for syntactic 
error repair because such errors are prevented on 
entry. 

All but the simplest statement types are 
predefined in the editor as templates . A template 
is a formatted syntactic skeleton that contains the 
keywords, matched parentheses and other punctuation 
marks of the given statement form. The template 
includes placeholders at each position where addi- 
tional code is required to complete the statement. 
The placeholders serve as visual cues indicating 
the syntactic class of each component required to 
complete the statement. 

Assignment statements, expressions, and lists 
of variables are called phrases . and are entered 
directly as text. Errors in phrases are detected 
immediately and can be corrected by local editing 
of the erroneous segment. 

Programs are created top-down by inserting new 
templates and phrases within the skeleton of previ- 
ously entered templates. Syntax error detection 
is immediate because placeholders can only be 
replaced by syntactically correct insertions. 

There is no delay between editing and execu- 
tion because programs are translated into inter- 
pretable form during program entry. Program 
development and testing can be interleaved: execu- 
tion is suspended when an unexpanded placeholder is 
encountered and can be resumed after the place- 
holder has been expanded. 

2.1. Creating a prograa 

The editing process is illustrated by writing 
a program to compute the absolute value of a given 
number. (The cursor is indicated by a box in the 
examples below.) 

Assume that we are already on the system and 
ready to create a program in the file named "abs". 
The screen would look as follows: 

editing abs 

Object 

The system is ready to create a program. The 
non-terminal "object" appears on the screen as a 
placeholder for the program. Commands, error mes- 
sages, and system status will appear above the 
dashed line - program text will appear below the 
dashed line. 


8 


ost 

the 

lly 

tic 

on 


are 
.ate 
the 
ion 
ate 
di- 
nt . 
ing 
to 


sts 

red 

ted 

ing 


new 

vi- 

ion 

be 


ram 
cu- 
• is 
.ce- 


ing 

ven 

the 


and 


abs 


The 
3 a 
es- 
the 
the 



One inserts a template by typing a command 
followed by a special function key. The command 
for a PL/I main procedure template is ".main". 
After typing the five characters ".main", the 
screen appears as follows: 

.main editing abs 


Object 

The command is echoed above the dashed line, but is 
not obeyed until terminated by a cursor motion. 
Upon typing return , the system immediately 
responds : 

editing abs 


/* Bomment */ 

abs: PROCEDURE OPTIONS (MAIN); 

{declaration} 

{statement } 

END abs; 

In one step, the placeholder "object" is replaced 
by the complete skeleton of a PL/I main procedure. 
The name of the procedure is "abs", inherited from 
the file name. 

Three new placeholders, "comment", "{declara- 
tion}" and "{statement}", identify locations where 
additional program elements can be inserted. The 
placeholders are descriptive names that serve as 
visual cues. The braces in the placeholder 
"{declaration}" indicate that a list of declara- 
tions is permitted there. Similarly, "{state- 
ment}" identifies a place for a list of statements. 

Any one of the three placeholders could be 
expanded next. It is simplest to enter the comment 
next as the cursor is already positioned there. 
Only the comment text is required as the delimiters 
"/*" and "*/" have already been provided. 

A comment is a phrase, thus any text typed at 
this point is inserted directly into the program. 
The first character typed replaces the placeholder 
"comment" whereupon the delimiter "*/" slides left 
to close the gap. As each additional character is 
typed, the delimiter "*/" slides right to make 
room. Using left and right, it is possible to 
position the cursor at any point within a phrase. 
As characters are inserted or deleted, the entire 
context to the right of the cursor immediately 
shifts to accomodate the change. 

editing abs 


/* print the absolute value of an input integerD */ 
abs: PROCEDURE OPTIONS (MAIN); 

{declaration} 

{statement} 

END abs; 

A mistake can be erased by typing rubout, 
whereby the "*/" shifts left again. If the entire 
text is rubbed-out, the placeholder "comment" reap- 
pears • 

Having completed the comment, the cursor is 
moved to "{declaration}" by typing return. The 
cursor skips over "abs: PROCEDURE OPTIONS (MAIN)" 
because it is part of the generated program tem- 
plate and cannot be altered. 


Boldface words such as return denote single 
keys on the terminal. 


editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

B3declaration} 

{statement } 

END abs; 

Two modes of program entry have been illus- 
trated: template insertion by command and direct 
insertion of phrases. 

Commands are used to insert templates for 
declarations and statement forms which contain 
lengthy keywords, parenthesized lists or 
DO ... END blocks. Commands give the user 
significant generative power while precluding 
the possibility of syntax error. The use of a 
single command to insert an entire construct 
emphasizes its unified, abstract purpose. The 
specific syntax used to represent the con- 
struct cannot be typed and cannot be modified. 
Only placeholders within a template can be 
expanded. 

Direct insertion of typed text is used for 
phrases such as comments, assignment state- 
ments, and lists of variables and expressions. 
Phrases consist mainly of English text, user 
defined symbols, and short operators. The use 
of commands at this level would be counter- 
productive. 

Continuing to develop the sample program, we 
generate a declaration template for FIXED variables 
by typing the command ".fx" followed by return: 

editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( Ulist-of-variables ) FIXED; 

{statement} 

END abs; 

Typing "k" followed by return* replaces the 
list-of-variables placeholder: 

editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

^declaration} 

{statement } 

END abs; 

Once again the cursor is positioned at "{declara- 
tion}". The editor guarantees the syntactic 
correctness of the program and will not accept a 
statement at this point. By simply typing 
return, however, "{declaration}" disappears and 
the cursor moves to "{statement}": 

editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

^statement } 

END ab s ; 


the cursor motion down would be preferable 
here - see section 2.2 


9 




A template for the input statement GET-LIST is 
generated by typing the command ".g" followed by 

return: 

editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

GET LIST ( fflist-of-variables ); 

END abs; 

Once again, typing n k" followed by return 
replaces the list-of-variables placeholder: 

editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

GET LIST ( k ); 

^statement } 

END abs; 

Typing ".i" followed by return gives the template 
for an IF-THEN-ELSE statement: 

editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

GET LIST ( k ); 

IF ( Bondition ) 

THEN statement 
ELSE statement 
END abs; 

Typing "k<0" return replaces the condition place- 
holder : 

editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN)j 
DECLARE ( k ) FIXED; 

GET LIST ( k ); 

IF ( k<0 ) 

THEN Statement 
ELSE statement 
END abs; 

It is not necessary to expand all placeholders 
in a template. Typing ".pi" return replaces the 
first statement placeholder with a PUT-LIST state- 
ment template. Typing return moves the cursor to 
the next statement placeholder leaving the list- 
of-expressions placeholder unexpanded. Typing 
".pi" return adds another PUT-LIST statement tem- 
plate: 

editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

GET LIST ( k ); 

IF ( k<0 ) 

THEN PUT LIST ( list-of-expressions ); 

ELSE PUT LIST ( HJist-of-expressions ); 

END abs; 

An error in typed text is detected as soon as 
the user attempts to move the cursor to another 
program element. For example, suppose we next type 
"kk". Then, as soon as return is typed, the bell 


rings, a highlighted error message appears on the 
top line, and the cursor is positioned at the point 
of the error: 

lerror undeclared variablel editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

GET LIST ( k ); 

IF ( k<0 ) 

THEN PUT LIST ( list-of-expressions ); 

ELSE PUT LIST ( Ek ) ; 

END abs; 

Typing clear erases the character at which the 
cursor is positioned, and corrects the program, 

2.2. Moving the cursor 

All modifications of program text occur rela- 
tive to the current cursor position. Using the 
cursor control keys of the terminal, it is possible 
to position the cursor wherever insertions and 
deletions are permitted. The cursor can only be 
positioned where modifications are allowed. 

The cursor 1 s motion through the print 
representation of the program corresponds to a 
preorder traversal of the underlying abstract syn- 
tax tree. Up and down move the cursor one tem- 
plate, phrase, or placeholder at a time. Possible 
stopping points for the cursor using np and down 
are indicated by underscores in the sample program 
below: 

/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

£ET LIST ( k ); 

IF ( k<0 ) 

THEN £UT LIST ( Jc ) ; 

ELSE £UT LIST ( k ); 

END abs; 


Left and right differ from np and down by 
also stopping at every character within a phrase. 

Return is also a cursor motion key, similar 
in function to down. It differs, however, by also 
stopping everywhere that a program element can be 
inserted into a list of non-terminals like 
{declaration} or {statement}: that is, at the 
beginning, at the end, and in between adjacent list 
elements. A placeholder appears at such list 
insertion points whenever the cursor is positioned 
there and disappears when the cursor is moved away. 
The figures below illustrate repeated use of 
return to advance the cursor: 

original screen 

/* demonstrate return */ 
demo: PROCEDURE OPTIONS (MAIN); 

BECLARE ( k ) FIXED; 

GET LIST ( k ); 

END demo; 


10 


ae 


point 


! 


abs 


' */ 


the 


i 

; 

\ 


ela- 
the 
• ible 
and 
7 be 

r int 
:o a 
syn- 
tem- 
ible 
iovn 
gram 


Hi 


f 

f 


i 


*/ 


a by 
e. 

alar 
also 
a be 
like 
the 
list 
list 
oned 
way. 
of 


after 1 return 

/* demonstrate return */ 
demo: PROCEDURE OPTIONS (MAIN); 

DECLARE ( B ) FIXED; 

GET LIST ( k ); 

END demo; 

after 2 returns 

/* demonstrate return */ 
demo: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 
ffldeclaration} 

GET LIST ( k ); 

END demo; 

after 3 returns 

/* demonstrate return */ 
demo: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

^statement } 

GET LIST ( k ); 

END demo; 

after 4 returns 

/* demonstrate return */ 
demo: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) fixed; 

SET LIST ( k ); 

END demo; 

after 5 returns 

/* demonstrate return */ 
demo: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) fixed; 

GET LIST ( B ); 

END d emo ; 

after 6 returns 

/* demonstrate return */ 
demo: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) fixed; 

GET LIST ( k ); 

^statement } 

END demo; 

More powerful structured cursor motions are 
also available. The two key sequence long down 
advances the cursor to the next element not struc- 
turally deeper in the program. The sequence long 
up moves backward similarly. Long return is 
like long down but also stops at "list insertion 
points". 

Thus, from the "G" of "GET ..." in the abso- 
lute value program example, long dovn moves the 
cursor to the "I" of "IF ..."; long up moves to 
the "D" of "DECLARE ..."; long return moves to 
"{statement}" in between "GET ..." and "IF ...". 

The diagonal key moves the cursor to the 
immediately enclosing program element. From the 
"PUT LIST" within the "ELSE" clause, diagonal 
moves to the "I" of "IF". Another diagonal moves 
to the beginning of the comment at the top of the 
program. 

Some templates contain optional components, 
for example, DO-loops contain an optional loop- 


name. Two mechanisms serve to minimize the visi- 
bility of optional language features: 

1) The only cursor motion that positions the 
cursor at an optional component is the command 
".o", meaning "move to optional part". 
Optional statement elements are transparent to 
all other cursor controls. 

2) Placeholders for optional components are 
only displayed when the cursor is positioned 
there. 

2.3. Modifying a program 

Phrases can be modified by first repositioning 
the cursor within the phrase. Characters can then 
be deleted or inserted in place. The syntactic 
correctness of edited text is verified when the 
cursor is moved away from the phrase. 

A program template is denoted by its initial 
character. For example, the "G" in 
"GET LIST ( k );" denotes the entire statement. 
When the cursor is positioned at such a point, 
editing actions refer to the entire program ele- 
ment • 

"Cutting and pasting", that is, modifying a 
program by relocating or changing structural units, 
can be accomplished by using the clip, delete, 
and insert keys. A single phrase or an entire 
template and its subordinate parts can be "clipped" 
or "deleted". The entire section of code disap- 
pears from the program and is replaced by the ori- 
ginal placeholder. 

Clipped code can be reinserted at any syntac- 
tically suitable place by repositioning the cursor 
and pressing insert. Clipped code is actually 
moved to a file named "CLIPPED" while delete 
(usually used to permanently erase a code segment) 
actually moves it to a file named "DELETED". Thus, 
deletion is reversible: in the event of an inadver- 
tent deletion, the original segment can be 
recovered by the command ".ins DELETED". 

A separate command is available to clip a list 
of consecutive statements. The two-key sequence 
long clip serves this purpose. The user posi- 
tions the cursor at the first statement of the list 
and types long clip. The user then positions the 
cursor at the last statement of the list and com- 
pletes the command by typing "•". The Synthesizer 
insures that only complete syntactic units are 
clipped. The entire list of statements is moved to 
the file "CLIPPED" where it is available for 
reinsertion. 

The clipping mechanism is used to enclose 
existing code in the scope of a new template. For 
example, in order to enclose the "GET..." and 
"IF..." statements in a loop: use long clip to 
clip the two statements: 

editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

Bstatement } 

END abs; 

type ".dw" to generate the DO-WHILE loop template: 


editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

DO WHILE ( Bondition ); 

{statement } 

END; 

END abs; 

enter the condition, and use insert to insert the 
clipped code into the body of the loop: 

editing abs 


/* print the absolute value of an input integer */ 
abs: PROCEDURE OPTIONS (MAIN); 

DECLARE ( k ) FIXED; 

DO WHILE ( 'l'b ); 

Bet list ( k ); 

IF ( k<0 ) 

THEN PUT LIST ( -k ); 

ELSE PUT LIST ( k ); 

END; 

END abs; 

Clipped code is automatically reindented with 
respect to its new context where reinserted. 

Declarations can be re-edited in the same way 
as the rest of the program, but such re-editing may 
cause temporary non-local errors. For example, in 
order to change the declaration of k from FIXED to 
FLOAT, one first deletes the FIXED declaration. 
This leads to several segments in the program that 
contain an undeclared variable. These erroneous 
segments are highlighted on the screen. When the 
FLOAT declaration for k is inserted, these 
highlighted areas are redisplayed in the normal 
font • 

2.4. Couent Templates 

The limited number of lines displayed on video 
terminals hampers editing large files. Comment 
templates provide a mechanism for hiding details of 
a file, thereby allowing more of the program to be 
displayed. 

A comment template is a single program unit 
for expressing a program specification together 
with its refinement [3]: 

/* comment */ 

{statement } 

It includes two placeholders: 

comment (the specification of WHAT to do), 

{statement} (the refinement saying HOW to do it). 

The {statement} placeholder is indented to show 
that it is the refinement of the specification pro- 
vided in the comment. Comment and {statement} are 
part of one template. 

The display of the refinement of a comment can 
be suppressed by positioning the cursor anywhere 
within the refinement and striking the ellipsis 
key. For example, 

/* Print absolute value */ 

EF ( k<0 ); 

THEN PUT LIST ( -k ); 

ELSE PUT LIST ( k ); 

would be redisplayed as 


/* Print absolute value */ 

□ .. 

Having hidden the details of the code, more of the 
program fits on the screen while the comment 
remains displayed to specify what the hidden 
refinement does. The hidden code at M ... n can be 
revealed by striking ellipsis again. Thus, com- 
ment templates allow selective display of the 
hierarchical structure of files. 

The ellipsis feature of comment templates pro- 
vides an incentive to use comments during program 
development rather than after the fact. Because it 
rewards a skillful, precise use of comments, this 
feature promotes good programming style and method. 

2.5. Execution 

Programs can be executed at any stage of 
development. The generation of interpretable code 
during program entry allows execution to begin 
immediately. Execution is suspended whenever an 
unexpanded placeholder is encountered, and control 
returns to the editor with the cursor positioned at 
the unexpanded placeholder. After the required 
code has been inserted, execution can be resumed. 

The high transmission rate of a video display 
terminal allows the incorporation of unique runtime 
debugging aids. In essence, these diagnostic 
features provide multiple windows into the computer 
through which one observes a running program. 

2.5.1. Flow tracing 

The "trace* 1 feature causes the screen to be 
divided; program text is displayed in the upper 
portion, while output generated by execution is 
displayed below. As the program is executed, the 
screen cursor moves through the program text indi- 
cating the flow of execution. Stopping places for 
the cursor during execution are the same as during 
editing: one cursor jump for each template and 
phrase. 

During flow tracing, the program display is 
automatically redrawn whenever control passes out- 
side the display window. Judicious use of the 
ellipsis feature can eliminate the trace of unin- 
teresting sections of code and minimize undesirable 
redrawing of the program display. 

Flow tracing at full speed provides a visible 
performance measure: the distribution of light 
intensities at the various cursor locations indi- 
cates the fraction of time spent there. 

2.5.2. Pace and single-step 

The "pace" feature allows the user to slow 
execution to any speed. A syntax-directed 
"single-step" feature permits manual control of 
program execution. The step-size of each resumption 
is specified in terms of the template and phrase 
structure of the file. It is possible to request: 

single-stepped execution of the current program 
element, 

atomic execution of the current program element, 
atomic completion of the enclosing template, 
atomic completion of the enclosing procedure, 
atomic completion of the remaining statements 
at the same indentation level. 

Consider stepping through the following program 
segment : 


12 


che 
nment 
-dden 
n be 
com- 
the 


? 

; 

{ 


pro- 
gram 
e it 
this 
hod . 


of 
code 
eg in 
an 
trol 
d at 
ired 
d. 

play 

time 

stic 

uter 


be 

pper 

is 

the 

ndi- 


and 

/ is 
out- 
the 
a in- 
able 


ible 

'-ght 

idi- 


s low 
:ted 
of 
:ion 
rase 


,ram 


i 

: 


DO WHILE ( k<n ); 

EF ( k<0 ) 

THEN PUT LIST ( -k ) ; 

ELSE PUT LIST ( k ); 
k= k+1 ; 

END; 

resaae would advance the cursor to n k<0 n , 
long resale would advance the cursor to 
n k= k+1;” by executing the entire IF-statement » 
diagonal would position the cursor at n D0 n until 
the loop is completed, long diagonal would posi- 
tion the cursor at the top of the procedure until 
control returns to the calling procedure, and 
return would complete the current iteration of 
the loop and position the cursor at "k<n". 

In this way, the Synthesizer maintains a har- 
monious view of static and dynamic program struc- 
ture. The syntactic units of the editor are the 
computational units of the interpreter. 

2.5.3. Variable monitoring 

Selected variables can be monitored during 
execution. These variables and their values are 
displayed in a separate partition of the screen. 
The result of each assignment to a monitored vari- 
able immediately appears on the screen. A least- 
recently-updated replacement strategy is used when 
there is not enough room to display all monitored 
variables • 

3. Development 

Goals in the development of the Synthesizer 
were: to provide a unified programming environment, 
to allow a high level of abstraction, to support 
top-down program development, and to encourage good 
documentation. 

In implementing the Synthesizer these goals 
were attained by following basic design principles, 
by responding to feedback from users of prototype 
versions, and by making careful tradeoff decisions. 

The next three sections expand upon these 
underlying design considerations. 

3.1. Principles of Design 

The following principles formed a basis for 
the design of the Synthesizer: 

specialization 

constraint 

consistency 

manual control 

immediate visual response 

multiple conceptual levels 

reversibility 

3.1.1. Specializntion 

Specialization means simplifying frequently 
performed and often lengthy tasks. This includes 
minimizing memorization, removing frustrations 
associated with syntactic detail, and decreasing 
the amount of time required to perform the tasks. 

The Synthesizer’s use of templates provides a 
simple and efficient way to enter and manipulate 
program constructs. Two or three character com- 
mands replace traditional time consuming and error 
prone methods for program entry. 


3.1.2. Constraint 

Constraint means preventing the user from per- 
forming actions that are counter-productive or 
meaningless . 

Allowing statements that are provided as tem- 
plates to be typed as text invites errors; there- 
fore such statements can only be inserted by com- 
mand. It would be counter-productive to introduce 
syntax errors by changing characters within a tem- 
plate, thus templates are immutable. It would be a 
waste of time to position the cursor at a point 
where no change was permitted, thus a cursor motion 
always moves the cursor to a meaningful place. 

3.1.3. Consistency 

A consistent approach is one in which the 
entire user interface is based on a few fundamental 
rules and tenets. Mastering these few ideas 
enables the user to generate most operations 
required to use the system. Hansen [6] refers to 
this principle as "predictable behavior". A con- 
sistent approach allows the user to learn a new 
system facility without having to learn yet another 
complex language. 

In the Synthesizer, the fundamental tenet is 
that all operations are based on the underlying 
program structure. For example, the structured 
cursor motions available in the edit phase 
correspond to the structured single-stepping opera- 
tions available in the execution phase. 

This unifying idea along with a consistent 
command format make the Synthesizer a system in 
which most users quickly become experts. 

3.1.4. Manual Control 

All aspects of the programming process should 
be under manual control, except when this would 
conflict with the principles of specialization and 
constraint. 

In the Synthesizer, the principle of manual 
control influenced the following design considera- 
tions : 

- error detection versus error correction 

- manual ellision versus automatic ellision 

- display of optional program components 

Errors are detected and announced rather than 
being corrected. Automatic error correction, when 
inconsistent with the user’s intent, is confusing 
and annoying. 

Ellision is done only on command. Our approach 
in this respect agrees with that of EMILY [6], 
whereas LISPEDIT [1] provides only automatic elli- 
sion. 

Placeholders for optional components such as 
loop names are displayed only on command. Place- 
holders for optional list elements appear only when 
the cursor motion return is used. Other cursor 
motions are available if the user prefers that the 
cursor not stop at placeholders for optional list 
elements . 

Manual control of execution is provided by the 
structured single-stepping operations. 


\ 

I 



13 






i 


3.1.5. Immediate Visual Response 

It is imperative that immediate visual feed- 
back be provided to allow the user to monitor the 
state of the system, correcting if necessary. Any 
screen-oriented editor provides such visual feed- 
back in terms of the effects of the user f s editing 
actions. The Synthesizer has several other 
features that follow this basic principles 

1) Edit time error detection - Syntactic 
errors (in phrases) are detected as soon as 
they are entered, an error message is 
displayed, and the cursor is positioned at the 
error. Semantic errors, such as those intro- 
duced by deleting a declaration, also cause 
immediate visual response: the highlighting of 
semantically invalid variable uses. 

2) Run time error detection - Run time errors, 
such as dividing by zero or encountering an 
unexpanded placeholder, cause an error message 
to be displayed. A further visual cue is pro- 
vided: control returns to the editor with the 
cursor positioned at the point of the error. 

3) Run time monitoring - Flow tracing and 
variable value monitoring provide immediate 
visual response to changes in the execution 
state. 

3.1.6. Multiple Conceptual Levels 

A system should be able to reflect multiple 
views held by a single user. At various times a 
user may view his program at different levels of 
abstraction; the system should mirror the user's 
current view. 

In the Synthesizer, the use of comment tem- 
plates coupled with the ellipsis feature allows the 
user to change the print representation of the pro- 
gram to match the level of detail he has in mind. 

3.1.7. Reversibility 

All actions should be easily reversible. 

The Synthesizer is notably weak in this area. 
The clip, delete, and insert operations provide 
reversibility at edit time. Reverse execution is 
currently being implemented. 

3.2. Design Modifications 

Two design decisions were modified because of 
feedback from users of prototype versions. 

1) Originally, the cursor motions left and 
right could be used to move the cursor only within 
the text of a phrase. However, many users thought 
that left and right would move the cursor in 
those directions no matter where the cursor was 
positioned. This disparity between user model and 
actual system performance led to confusion and 
frustration, and so left and right were modified 
to act like up and down outside the scope of a 
phrase. 

2) An initial design goal was that programs 
be correct at all stages of development. In early 
versions, the cursor could not be moved away from a 
phrase that contained an error; either the error 
had to be corrected, or the phrase had to be 
deleted. Experience caused us to relax this res- 
triction. 


The implemented dialect of PL/I requires that 
all variables be declared at the head of the pro- 
cedure; thus, a phrase that contains an undeclared 
variable is erroneous. The natural way to correct 
such an error is to move to the head of the pro- 
cedure and insert the declaration; however, in the 
original Synthesizer, this could not be done 
without first deleting the phrase. 

In the current Synthesizer, the first attempt 
to move the cursor away from an erroneous phrase 
fails. The error is signaled by a bell and a mes- 
sage, and the cursor is positioned at the error. 
The error need not be corrected immediately; the 
cursor can be moved away, but the phrase is left 
highlighted as a visual reminder. 

3.3. Design Decisions 

In this section we discuss the rationale 
behind several critical decisions in the design of 
the Synthesizer with reference to related systems. 

3.3.1. Abstract structure editing vs text 
editing 

The question is what to edit, the abstract 
object or a textual representation of the object? 
Where does the Synthesizer lie on the spectrum 
between pure structure editing and traditional text 
editing? 

Pure structure editors such as EMILY [6] and 
GANDALF [5] require that every syntactic unit be 
inserted by command. In particular, expressions, 
even though displayed in infix notation, are 
entered in prefix order with the operators serving 
as commands. For example, to enter the expression 
”(a*b)**( (c+d)/e)", one would treat its consti- 
tuents in the following order: ***ab/+cde. 
One of the advantages of this approach is the 
automatic display of parentheses implied by opera- 
tor precedence and the reduction of errors caused 
by misconceptions about precedence orderings. Such 
editors present a uniform and consistent conceptual 
framework to the user: the entire program is an 
abstract syntax tree. 

In designing the Synthesizer, we decided that 
preorder entry of infix expressions was awkward and 
therefore undesirable. Despite the observation by 
its proponents that n people learn to use and love 
postfix pocket calculators”, we felt that the zig- 
zag cursor motion in the infix display during pre- 
fix entry would be confusing and distracting. 
Furthermore, the need to delimit consecutive 
operands and operators leads to an increase in 
required keystrokes and a loss of typing con- 
tinuity. We also felt that infix expressions are 
more easily modified by text editing than by tree 
manipulation. 

Editors such as INTERLISP[10] and MENTOR 
compromise by allowing textual input that becomes 
structure as soon as the input is parsed. 
Thereafter, only structural modifications are per- 
mitted. We consider the inconsistent treatment of 
program entry and program modification a major 
weakness of this approach. The program is not 
entered according to its structure; thus it is not 
obvious what the structure is. This is not a prob- 
lem with LISP which has a trivial abstract syntax; 
however, we expect this inconsistency is a problem 
for a language with a complicated syntax like PAS- 
CAL or PL/I. 


14 



'■hat 

)- 

. ed 
rrect 
pro- 
the 

done 

:empt 
irase 
mes- 
:ror . 

the 

left 


male 
;n of 
±ms • 

text 


:ract 
; ect ? 
:trum 
text 


and 
t be 
Ions , 
are 
-ving 
sion 
isti- 
d e. 
‘■he 

-ed 
Such 
)tual 
s an 

that 

1 and 
n by 
love 
zig- 
pre- 
-ing. 
itive 

2 in 
con- 

are 

tree 


:ntor 

:omes 
:sed • 
per- 
.t of 
lajor 
not 
i not 
mob- 
itax; 
>blem 
PAS- 


i 

I 


f 

j 


The compromise adopted in the design of the 

Synthesizer is a strict partitioning of the 

language into constructs edited as structure and 
constructs edited as text. A program unit entered 
as a template remains an abstract structure and 
must be manipulated as a whole unit; a program unit 

entered as text remains text and must be modified 

as text. By permitting only one editing style for 
each syntactic unit, we maintain a consistent con- 
ceptual viewpoint. Our hybrid of text and struc- 
ture sacrifices little; most expressions are short 
and have slight structure. Nor do we sacrifice 
lexical structure; for example, it is possible to 
globally rename the variable "i" without changing 
every occurrence of the letter n i n . 

3 . 3 . 2 . Monoaorphic vs polymorphic pretty- 
printing 

In the Synthesizer, the print representation 
of a file is independent of cursor position and can 
be changed only through the use of the ellipsis 
feature. By contrast, systems with polymorphic 
prettyprinting, such as LISPEDIT, in attempting to 
retain the entire program on the screen, perform 
automatic ellision as well as compression of print 
representations in response to cursor motion. 

We support the idea of multiple representa- 
tions, but believe that this should be under manual 
control. In essence, the polymorphic approach 
says, we know the cursor position, and that is 
enough to determine the user's field of interest; 
try to show as much as possible, but focus on the 
area around the cursor. The Synthesizer approach 
says, we don't know what the user wants, and we 
don't want to risk doing the wrong thing; we will 
not attempt to keep the entire program in view, but 
the ellipsis feature is available to allow the user 
to bring multiple areas of interest inside the win- 
dow. 

We also believe that a program should not 
undergo radical changes in shape. A programmer 
learns to find his way around a program based on 
its shape, and is apt to become confused if that 
shape is changed behind his back. In polymorphic 
systems, the shape of a code segment is dependent 
on its position relative to the cursor. For exam- 
ple, an IF-THEN-ELSE statement that is displayed on 
three lines when the cursor is close by may be 
compressed to a single line as the cursor moves 
away. In the Synthesizer, the shape of a code seg- 
ment changes only through ellision, and, as this 
removes all code subordinate to a comment, the 
basic shape of the program is preserved. 

3 . 3 . 3 . Point cursor vs extended cursor 

In the Synthesizer, the cursor designates an 
entire subtree of a program, yet is displayed as a 
point, as if it designated only a single character. 
LISPEDIT and GANDALF use an extended cursor, 
highlighting the entire frontier of each subtree. 

The extended cursor provides a clearer picture 
of the internal tree representation of a program 
than the point cursor. The extended cursor is also 
less ambiguous than the point cursor. For example, 
when the point cursor is positioned at the begin- 
ning of a phrase it is impossible to tell whether 
the cursor designates the entire phrase, or just 
the first character; there is no such ambiguity 
with the extended cursor. 


The point cursor, however, better supports 
linear models of cursor motion. The cursor motions 

up and down actually correspond to moving forward 
and backward in a preorder traversal of the 
program’s abstract syntax tree. How many users 
actually have this in mind when they move the cur- 

sor? Probably not very many. A perfectly reason- 

able, alternate model for cursor motion is to think 
of up and down as moving the cursor forward and 
backward in terms of the normal reading order of 
the program. The cursor moves in jumps rather than 
a single character at a time, but its movement is 
consistent with this point of view. 

Yet another model for cursor motion is to 
think of up, down, left, and right. as mov ing 
the cursor in those directions on the screen. 
Actual cursor motion is not entirely consistent 
with this view (for example, down may sometimes 
move the cursor to the right) but it is possible to 
use the system under this model because of the 
immediate visual feedback provided. 

Using the Synthesizer with one of these alter- 
nate models of cursor motion decreases the level of 
sophistication needed to use the system, and thus 
increases the number of potential users. 

Another advantage of the point cursor is that 
it can be moved quickly through a program without 
causing the distracting and possibly time consuming 
highlighting and unhighlighting of text that would 
be done for the extended cursor. 

An attractive compromise, limited however, to 
terminals with enough fonts, would be to combine 
the two types of cursor and display a highlighted 
extended cursor containing an even more highlighted 
point cursor. 

3 . 3 . 4 . Vindow repositioning algorithm 

When the cursor is moved outside the frame of 
view, a new region of the file containing the cur- 
sor must be painted within the window. What should 
that new region be? The Synthesizer recenters the 
point cursor within the window; by contrast, GAN- 
DALF attempts to include the entire extent of the 
designated subtree within the window. 

Once again, our approach is that we don't know 
what the user really wants, and so prefer to give 
him manual control. Thus we provide a simple, 
straightforward redrawing scheme, and an easy way 
for the user to reposition the text within the win- 
dow. The Synthesizer user can adjust the position 
of the text within the window to obtain his desired 
view. If the redrawing algorithm forces the inclu- 
sion of the entire current subtree, some views of a 
program might be precluded. 

The Synthesizer's scheme can lead to excessive 
redrawing at execution. Suppose we execute a pro- 
gram containing the following segment: 

GET LIST ( n ); 

DO WHILE ( i<n ) ; 
i= i+1; 

PUT LIST ( i ); 

END: 

with a screen that has room for just three lines of 
text at a time. The executable portion of the 
WHILE loop is exactly three lines long, and so 
could be displayed on the screen while it is exe- 
cuted with no redrawing of the screen. However, if 
recentering is required as the loop begins execu- 



15 



tion. then the screen will alternate between the 
following two views: 

1) GET LIST ( n ); 2) i= i+1; 

DO WHILE ( i<n ); PUT LIST ( i ) i 

i= i+1 ; END ; 

Perhaps the best redrawing scheme would be a 
combination of the Synthesizer's and GANDALF's: 
center the point cursor at edit time* and center 
the designated subtree at execution. 

4. Conclusion 

The Synthesizer has been used successfully in 
programming courses for two years. Although it is 
used primarily on relatively small programs, 
features such as comment templates, structured cur 
sor motion, and the ability to interleave editing 
and execution make the Synthesizer well suited for 
developing large programs as well. Continuing 
research and development of the Synthesizer will 
increase its power and range of application. 

Systems such as the Synthesizer are built to 
conform to the restrictions of current programming 
languages. For example, the "dangling else" prob 
lem is a problem only when a program is viewed as a 
sequence of characters. Using the Synthesizer, 
there would be no ambiguity in inserting an IF-THEN 
statement inside an IF— THEN— ELSE statement, it is 
disallowed only to conform with the rules of PL/I. 
Similarly, the use of a BEGIN END pair around a 
block of statements is obsolete; automatically gen- 
erated indentation level serves instead. Perhaps 
it is now time to design languages to take advan- 
tage of the features of syntax-directed programming 
environments like the Synthesizer. 

Kef eremces 

1* Alberga, C.N., Brown, A.L., Leeman, G.B., Mik- 
elsons, M., Wegman, M.N. A Program Development 
Tool. Conference Record of the Eighth Annual 
Symposium on Principles of Programming 
Languages, Williamsburg, VA, January, 1981, 
pp. 92-104. 

2. Conway, R. and Constable, R. PL/CS A dis- 
ciplined Subset of PL/I. Technical Report 
76-293, Department of Computer Science, Cor- 
nell 1976. 

3 . Conway, R. and Gries, D. A a introduction to 
Programming — a structured approac h asias. 
PL/I aai Eh/S.* Winthrop 1979. pp. 135-137. 

4 . Donzeau-Gouge, V., Huet, G., Kahn, G., Lang 
B., and Levy, J.J. A structure-oriented pro- 
gram editor. Technical Report, IRLA-LABORLA, 
France 1975. 

5 . Habermann, A.N. An overview of the Gandalf 
project. Computer Science Research Review 
1978-79, Carnegie -Mel Ion University, 1979. 

6 . Hansen, W. User engineering principles for 
interactive systems. Fall Joint Computer 
Conference, 1971 


Report No. TR79-381, Dept. Computer Science, 
Cornell U., July 1979. Revised January 1980. 

9 . Teitelbaum, T. and T. Reps The Cornell Pro- 
gram Synthesizer: A syntax-directed program- 
ming environment. Comm. ACM, to appear. 

10 . Teitelman, W. INTERLISP reference manual. 
Xerox PARC, 1974. 

11 . Wilcox, T.R., Davis, A.M., and Tindall, M.H. 
The design and implementation of a table 
driven, interactive diagnostic programming 
system. Comm. ACM 19, 11 (November 1976), pp. 
609-616. 


7 . Teitelbaum, T. The Cornell program syn- 

thesizer: a microcomputer implementation of 
PL/CS. Tech. Report No. TR79-370, Department 
of Computer Science, Cornell University, June 
1979. 

8 . Teitelbaum, T. The Cornell program syn- 
thesizer: a tutorial introduction. Tech. 


16 



Department of Computer Science. University of Arizona. 
Tucson, Arizona 85721 


M.H. 

able 

ming 

pp. 


i 

i 

i 

I 


Abstract 

Program editors help users create syntactically correct 
programs. Though such editors normally edit parse trees, 
applying similar techniques to other tree structures that 
need editing helps both users and implementors. This 
paper describes an editor that accepts a grammar describ- 
ing a hierarchical data structure and allows the user to 
enter and edit arbitrary trees having this structure. It 
displays the pros and cons of this approach using instances 
of this editor that edit formatted documents, simple line 
drawings, and stick figures for trees. 

1. Introduction 

Program editors [Sandewall, Teitelbaum, van Dam] 
help users create programs. They prevent the entry of syn- 
tactically incorrect programs, they offer abbreviations for 
verbose constructs, and they display programs in a 
pleasant, consistent fashion. Though program editors are 
typically syntax-directed, and though structures other than 
parse trees require editing [Fraser, Fraser and Lopez, van 
Dam], little has been said about exploiting the generality 
that syntax-direction allows. For example, a syntax- 
directed editor might be given a description of the structure 
that document formatters impose on text. Users would be 
able to move sections and paragraphs as logical units, and, 
just as program editors compile code as it is entered, a 
document editor might format text as it is entered, display- 
ing the formatted result instead of interleaved text and for- 
matter commands. Implementors should also benefit. Just 
as compilers driven by formal language descriptions are 
usually easier to to understand, code, and modify than 
their ad hoc counterparts, an editor driven by a formal 
structure description should make it easier to create new 
editors (e.g., for new structures) and to modify old editors 
(e.g., to accommodate different tastes in formatting). 


Permission to copy without fee all or part of this material 
is granted provided that the copies are not made or distri- 
buted for direct commercial advantage, the ACM copy- 
right notice and the title of the publication and its date ap- 
pear, and notice is given that copying is by permission of 
the Association for Computing Machinery. To copy oth- 
erwise, or to republish, requires a fee and/or specific per- 
mission. 

° »81 ACM 0-89791-043-5/81/0600/0017 S00.75 


This paper describes a syntax-directed editor, sds, and 
its application to the problem of editing general data struc- 
tures. It displays the pros and cons of sds’ approach 
through examples of its instances. Though sds has been 
used to build a typical program editor (for a subset of the C 
programming language [Kernighan]), this paper will focus 
on less conventional applications: a binary-tree editor, an 
interactive document formatter, and a graphics editor. 
Section 2 presents these instances. Section 3 discusses sds' 
implementation and future. 

2. Example Editors 
2.1 A Binary-Tree Editor 

sds is best understood through examples, and the sim- 
plest instance of sds edits uninterpreted binary trees that it 
displays on a graphics terminal by connecting nodes with 
arrows, sds extracts all of its structure-dependent parame- 
ters from a grammar that resembles grammars accepted by 
typical compiler-compilers [Johnson]. The grammar 
describing binary trees has only one production: 

tree = value tree tree : dotree(value,tree,tree2) 

The syntax description appears before the colon. It 
says that a tree is a value and two subtrees. The grammar 
need not say that trees may be empty because sds allows 
any field to be left empty until is it convenient to fill it. 

The semantic action appears after the colon. Like sds, 
it is written in SNOBOL4. In general, id in the semantic 
action refers to the first occurrence of nonterminal id in the 
syntax description, and id n refers to the nth occurrence of 
nonterminal id for n>\. Thus tree2 in the semantic action 
above refers to the second occurrence of tree in the the syn- 
tax description. The semantic action above displays binary 
trees by passing the value and subtrees to subroutine 
dotree, which formats them for display. The code for the 
binary-tree editor’s semantic routines is shown below. It is 
included only to suggest how sds is instantiated — SNO- 
BOL4 details not are important here. An explanation fol- 
lows the code. 


17 



px = 39; py =0: dx = 20 
DEFINE('dosub(x,bpx,bpy,px,py,dx)') 
DEFINE('dotree(v,l,r)’) :(ends) 

dosub dosub = DIFFER(x) line(bpx,bpy+1,px,py-1) put(x) 
:(RETURN) 

dotree dotree = curpos(px - SIZE(v) / 2, py) v 

dotree = dotree dosub(l,px,py,px-dx,py+4,dx/2) 
dotree = dotree dosub(r,px,py,px+dx,py+4,dx/2) 
:(RETURN) 

ends 

The first line initializes variables that hold the screen coor- 
dinates at which the root is to be displayed (px,py) and the 
horizontal displacement between the root and its descen- 
dants (dx). The remaining lines define dotree and its sub- 
routine dosub that draws subtrees. If a subtree is empty, 
dosub returns nothing. Otherwise, it returns code that 
draws a line from a node to one of its subtrees (line(...)) 
and then draws the subtree itself (put(x)). put is sds’ display 
routine. It invokes the semantic actions, and most seman- 
tic actions call it recursively to display subtrees, dotree 
produces code that centers a value at position (px.py), and 
then calls dosub to produce code for the left and right sub- 
trees. These calls temporarily adjust px, py, and dx to 
move subtrees down, right or left, and closer together. 

This code is appended to the grammar, and a syntax 
preprocessor compiles the result into a record declaration 
for datatype tree (with fields value, tree, and tree2) and 
code to check the syntax of input and to format a binary 
tree for output. The resulting code is loaded with sds to 
form a complete editor. 

sds is a screen editor. It always displays the current 
version of some record (called the ‘current’ record) in the 
structure being edited. Most semantic actions display 
records by recursively displaying their subfields, so sds 
usually displays an entire subtree of the complete ‘parse’ 
tree, though less ambitious semantic actions may be given. 
The user moves about by striking the terminal’s cursor con- 
trol keys: down moves to the current record’s first field, up 
moves back, right and left move to an adjacent field, and 
home moves to the root. For example, if sds is displaying 

1 

/\ 

2 3 

A A 

4 5 6 7 

down causes it to display 
1 

because the first component of a record of type tree is its 
value field. A subsequent right causes it to display 


2 

l\ 

4 5 

because the second component of a record of type tree is its 
first subtree. A subsequent up would move back to the first 
display. Fields are traversed in this order because the 
grammar gives them in this order. Were the syntax of trees 
changed to 

tree = tree value tree : dotree(value,tree,tree2) 

the initial down would move to the left subtree, and the 
subsequent right would move the value field. 

All other sds commands are entered by typing a line of 
text.t To enter a terminal string or leaf, the user types the 
string that is to replace the current record. Subtree dele- 
tion is a special case of this command — the user merely 
replaces the current record with the null string. To enter a 
record corresponding to a nonterminal, the user types the 
name of the nonterminal preceded by a period. After such 
a command, sds drops down to focus on its first empty 
field. For example, if the user types the command .tree, 
sds creates a new node of type tree and drops down to its 
value field so that the user may start filling in the new sub- 
tree. 

sds offers a few structure-independent commmands. 
.hide suppresses the current record (and thus its descen- 
dants) in subsequent displays, saving screen space, and 
show causes the current subtree, if hidden, to appear once 
again in subsequent displays, .w file writes the current 
record and its descendants to file, and rfile reads a subtree 
from such a file and replaces the current record with it. 
.pick saves a pointer to the current record, and .put replaces 
the current record with the last-picked record, .pick and 
.put can be used to insert and delete parts of trees. For 
example, a new node may be inserted above the root by 
picking the root, replacing it with a new node, and putting 
the old root down as one of the new root's subtrees. Alter- 
nately, the root may be deleted and replaced with one of its 
subtrees by picking the subtree and putting it down on the 
root. 

2.2 A Document Editor 

Another instance of sds edits simple documents, 
displaying not interleaved formatter commands and text, 
but the final formatted result [Coulouris. Shaw, Shaw et 
al.]. It uses the same code as the binary-tree editor, but its 
grammar and semantic routines differ: 

paper = title sect : center(title) nl nl put(sect) 
sect = header pp sect: header nl nl put(pp) put(sect) 
pp = text pp : break(text) nl put(pp) 

That is, a paragraph is some text and a pointer to the next 
paragraph, a section is a header and pointers to its first 
paragraph and the next section, and a paper is a title and a 

t sds' command language has been influenced by the Cor- 
nell Program Synthesizer [Teitelbaum]. 


18 



5 is its 
e first 
■e the 
trees 


d the 

me of 
es the 
dele- 
lerely 
iter a 
es the 
' such 
rnpty 
tree, 
to its 
. sub- 

sands. 
,'scen- 
, and 
once 
irrent 
ibtree 
: t. 
-ces 
k and 
. For 
■ot by 
utting 
Alter- 
of its 
-n the 


nents, 
: text, 
aw et 
rut its 


ect) 

j next 
s first 
and a 


pointer to the first section. Further, a paper is presented by 
centering the title (centerf title)) and appending two ‘new- 
line’ characters (nl nl) and the formatted sections 
(put(sect)); a section is displayed similarly, though its 
header is not centered; a paragraph is displayed by insert- 
ing newlines so as to fill each line (break(text)). The code 
for center and break is appended to the grammar, and the 
result is compiled into a list of record declarations (for 
datatypes paper, sect, and pp) and code to check syntax on 
entry and to format a document’s parse tree for output. 

Though a written description is a poor substitute for an 
interactive demonstration, the following trace suggests 
how sds is used. Each bit of indented text below describes 
the effect of one command. The prompt orients the user 
with the list of field names used to reach the current 
record. f 

prompt: 

command: .paper 

tells sds to create a record of type paper, drop down 
to its (null) title field, display it, and wait for a com- 
mand. Before creating any record, sds checks to see 
that it is syntactically correct at this point in the tree. 
Had the user requested a paragraph, sds would have 
created nothing. 

prompt: title 

command: Syntax-Directed Editing 

tells sds to enter the given string as the title, advance 
to the paper’s sect field, display it, and wait for a 
command. 

prompt: sect 

command: .sect 

tells sds to create a sect record, attach it to the 
paper, and drop down to its first field. 

prompt: sect header 

command: Introduction 

tells sds to enter ’Introduction’ as the section header 
and to advance to the section’s pp field. 

prompt: sect pp 

command: .pp 

tells sds to create a pp record and to drop down to its 
text field. 

prompt: sect pp text 

command: Program editors help users... 

tells sds to enter a string in this field and to advance 
to the field that will hold the next paragraph. This 

tlf a field may be occupied by items of several different 
types, the type of the current resident is appended to the 
field name so that subsequent field names will make sense. 
Alternately, sds could be extended to offer a command to 
help orient the user. It might ignore the semantic routines 
and display the entire tree with boxes and arrows, 
suppressing leaves and highlighting the current record. 


string may be arbitrarily long, and sds provides 
facilities like those of a conventional display editor 
[Irons] that may be used to edit a string before enter- 
ing it into the structure. 

prompt: sect pp pp 

command: .pp 

tells sds to create a second paragraph and to drop 
down to its text field. 

prompt: sect pp pp text 

command: This paper describes... 

tells sds to enter a string there. At this point, the 

command 

prompt: sect pp pp pp 

command: up 

will format and display that paragraph, 

prompt: sect pp pp 

command: up 

will format and display both paragraphs, and 

prompt: sect pp 

command: down 

will return the first paragraph’s text field, which sds 
will display and open for editing with the same con- 
ventional display editor that is available when typing 
commands. 

Note that, besides checking the document for syntactic 
correctness as it is entered, sds guides the dialog with 
prompts that suggest what is to be entered next. While the 
dialog has been rather long, the user has typed only the 
commands, which are no wordier than those typed to a 
conventional document formatter [Ossanna, Reid]. In 
fact, sds’ representation of trees in permanent files closely 
resembles the input to conventional document formatters. 
The ,r and .w commands read and write trees in a prefix 
form. ,w writes the type of the node and then it recursively 
writes each of the node’s fields. For example, the docu- 
ment created above would be written as 

.paper 

Syntax-Directed Editing 
sect 

Introduction 

■PP 

Program editors help users... 

■PP 

This paper describes... 

This approach to document formatting has both 
advantages and disadvantages. On one hand, sds would be 
hard put to handle complex typography. For example, 
global problems like widow-suppression poorly fit the 
context-free model, and one would want a different com- 
mand syntax — say, codes embedded in text, either typed 
or entered via a menu [Ellis] — for specifying frequent font 
changes. However, sds is adequate for certain forms- 
driven data entry and editing [Ellis] and for simple docu- 


19 



ments. For example, it would be easy to extend the gram- 
mar above to create a business-letter editor that would 
prompt for the various fields (e.g., address, salutation) and 
assemble a properly-formatted letter from the responses. 
Also, sds makes it fairly easy to adapt semantic routines so 
that a document can be formatted in different ways to suit 
different tastes [Reid]. Finally, while the hierarchical view 
of documents is less conventional and thus less-understood 
than the linear view, it is useful often enough (e.g., by 
allowing one to insert, delete, and move whole sections as a 
unit) that it deserves closer examination. 

2.3 A Graphics Editor 

A less conventional instance of sds edits simple line 
drawings. Again, it uses the same driver as the editors 
above, but its grammar differs: 

pic = branch | color | move | scale | line] 

branch = pic pic : put(pic) put(pic2) 

color = newcolor pic : docolor(pic,curcolor,newcolor) 

move = x y pic : dotr(pic,ta,tb,tc+x,td,te,tf+y) 

scale = x y pic : dotr(pic,x*ta,x"tb,x'tc,y*td,y*te,y'’tf) 

line = points : doline(points) 

That is, a picture is a line or a command to color, scale, or 
move a subpicture, branch does nothing — it merely 
allows two subpictures to inherit one set of attributes. The 
semantic routines are docolor, dotr, and doline. docolor 
calls sds’ display routine to format a picture and surrounds 
the result with control codes that switch first into color 
newcolor and then back to old color curcolor. dotr adjusts 
some global variables — ta-tf, which define a transforma- 
tion that doline applies to all points before connecting 
them with lines — displays a subpicture, and restores the 
original transformation. Code to initialize the global vari- 
ables and to define docolor, dotr, and doline is appended to 
this grammar, and sds’ preprocessor compiles the result 
into a list of record declarations, a syntax checker, and a 
display routine. The resulting editor makes it fairly easy to 
create and edit simple pictures. For example, the com- 
mand sequence 

.branch 

.color 

red 

.line 

0,0 100,100 
.color 
blue 
.line 

0,100 100,0 

draws a large ‘X’ with a red rising stroke and a blue falling 
stroke, and 


[Nonterminals whose definitions use only alternation are 
omitted from sds’ parse trees to avoid clutter. According- 
ly, productions involving only alternation have no seman- 
tic action. 


home 

down 

down 

green 

makes the red stroke green. Again, a written trace is a poor 
substitute for a demonstration, because sds would have 
been changing the display with each command to show the 
path to. and contents of, the current record as it changes. 

Because .pick copies pointers, not subtrees, it can 
violate tree structure. Though this feature is dangerous, 
users editing structures like the graphics structure may find 
it handy. By picking a structure and putting it down in 
several places, all instances of that structure may be 
changed by changing the single copy. (Because this feature 
is not universally desirable, it would be better if sds offered 
both copying and non-copying .pick commands). This 
observation raises a larger issue: sds can be made to edit 
arbitrary graph structures. Semantic actions for cyclic 
structures would have to take care to avoid loops, and the 
.r and .w commands would have to use a different encod- 
ing, but sds does not otherwise assume that it is editing a 
tree. 

The graphics editor is incomplete. For example, it can 
neither rotate pictures, clip them to fit the screen, nor 
present objects other than lines (e.g., filled polygons, 
curves). All of these features are easily added by extending 
the grammar and code generation routines, but some 
features resist this attack. For example, sds’ command 
language offers no way to enter coordinates by pointing 
instead of by typing numbers, so this change would have to 
be made to sds, not the grammar. A general solution to 
this problem may be to have one grammar that defines the 
structure and another that defines the command language. 


3. Discussion 

sds is written in SNOBOL4, though it could have been 
implemented in a more conventional, compiled language 
— interpretation and garbage collection are handy but not 
required. Its data-independent code is roughly 200 lines 
long. The syntax descriptions (with semantic routines) for 
the structures described above are 15-40 lines each, and 
they are compiled into code roughly 2-3 times that length. 
An editor like the binary-tree editor can be brought up in 
about an hour by someone familiar with sds. The semantic 
actions require most of the effort. Once they are Finalized, 
the editor may be changed quickly. For example, when 
writing this paper it became obvious that it would be easier 
to describe Section 2.2’s document structure than the origi- 
nal one: 

paper = title sects : center(title) nl nl put(sects) 
sects = sect sects : put(sect) put(sects) 
sect = header pps : header nl nl put(pps) 
pps = pp pps : put(pp) put(pps) 

pp = text : break(text) nl 

While this change resulted in changes to many lines of 
(generated) code, the ability to change only the syntax 


20 


i poor 
have 
>w the 
res. 

t can 
erous, 
15 / find 
>wn in 
ay be 
eature 
ffered 
This 
o edit 
cyclic 
nd the 
ncod- 
.ting a 



i 

I 


description allowed it to be completed in five minutes. 

sds is loaded with several short SNOBOL4 routines 
that handle the terminal interface. Thus text-based editors 
like the document and C editors run on several models of 
terminals, though the tree and graphics editors run on only 
one model because their semantic actions assume that 
model’s control sequences. 

sds is experimental. It still needs thorough testing, 
optimization, and documentation, and many aspects need 
polishing. For example, sds’ cavalier screen refreshing 
would be tedious were communications slow; it would be 
better to display more context than just the current record 
and then indicate the current record by highlighting it or by 
pointing the cursor at it. While the user interface needs 
work, more complete versions of the document and graph- 
ics editors, and attacks on new data structures, are likely to 
produce more interesting results. 

Acknowledgment 

This work has benefited greatly from discussions with 
Dave Hanson. 


University of Washington, October 1980. 

T. Teitelbaum. The Cornell program synthesizer: A 
tutorial introduction. Technical report 79-381, Depart- 
ment of Computer Science, Cornell. 1980. 

A. van Dam and D. Rice. On-line text editing: A survey. 
ACM Computing Surveys 5(3):93-l 14, September 1971. 


it can 
n, nor 
vgons, 
nding 
some 
imand 
inting 
to 
a to 
tes the 
uage. 


e been 
guage 
ut not 
) lines 
es) for 
h, and 
length, 
t up in 
-nantic 
alized, 

. when 
: easier 
s origi- 


nes of 
syntax 


References 

G. Coulouris et al. The design and implementation of an 
interactive document editor. Software — Practice and 
Experience <5(2):27 1-279, April 1976. 

C. Ellis and G. Nutt. Office information systems and com- 
puter science. ACM Computing Surveys A2(l):27-60, 
March 1980. 

C. Fraser. A generalized text editor. Communications of 
the A CM 23(3): 154-158, March 1 980. 

C. Fraser and A. Lopez. Editing data structures. To 
appear in A CM Transactions on Programming Languages 
and Systems. 

E. Irons and F. Djorup. A CRT editing system. Commun- 
ications of the ACM /5(I): 1 6-20, January 1 972. 

S. Johnson. YACC — yet another compiler-compiler. 
Technical report, Bell Labs, Murray Hill, NJ, 1975. 

B. Kernighan and D. Ritchie. The C programming 
Language. Prentice-Hall, 1978. 

J. Ossanna. Troff user’s manual. Technical report. Bell 
Labs, Murray Hill, NJ, 1977. 

B. Reid. A high-level approach to computer document for- 
matting. Conference Record of the Seventh Annual ACM 
Symposium on Principles of Programming Languages:24- 
31, January, 1980. 

E. Sandewall. Programming in the interactive environ- 
ment: The LISP experience. ACM Computing Surveys 
I0( 1):35-7I, March 1978. 

A. Shaw. A model for document preparation systems. 
Technical report 80-04-02, Department of Computer Sci- 
ence, University of Washington, April 1980. 

A. Shaw, R. Furuta, and J. Scofield. Document format- 
ting systems: Survey, concepts, and issues. Technical 
report 80-10-02, Department of Computer Science, 




21 



The Implementation and Experiences of a 
Structure-Oriented Text Editor. 

O. Stromfors and L. Jones jo 

Software Systems Research Center, Linkoping University, Sweden 


ABSTRACT: This paper presents a generalized approach to data editing in interactive 
systems. We describe the EDS editor, which is a powerful tool for text editing combining 
the ability to handle hierarchical structures with screen-oriented text editing facilities. 
Extensions for handling simple pictures and formatted data records in a uniform way are 
part of our approach. Examples of ED3 applications are presented 


1. INTRODUCTION. 

An increasingly important facility in interactive computer systems 
is the ability to organize, present and update collections of data. 
We will use the term editor in a generalized sense to denote a 
program supporting this process, independently of whether we 
are performing text editing, database query and manipulation, or 
interactive graphics. 

The fact that there are striking similarities in all these kinds of 
data editing has not lead to many unified approaches to the 
editing task. This situation provided us with a motivation for 
undertaking a project aiming at the design and implementation 
of such a universal editor, supporting processing of text, pictures 
and formatted data records in a uniform way. The ability to 
handle explicit hierarchical structures was considered an 
important part of this editor. 

The rest of this paper will present the implementation and 
experiences of a prototype version of such an editor. This system 
has been named ED3, and is primarily designed for manipula- 
tion of hierarchically structured texts. An experimental extension 
allowing graphics editing of hierarchically structured pictures in 
a uniform way has also been implemented. The work is being 
done as a continuation of the LOIS project for investigation of 
new structures for office information systems [SAN80], 

It seems to us that very few efforts have been made in order to 
systematize and explore the realm of data editing as the primary 
interface to an interactive computer facility. Some isolated 
attempts to generalize the concept of editing have been made. 
For instance, in a recent paper the possibility for different 
editing tasks to share a text editors command language was 
recognized [FRASO], But very few approaches to the design of 

Permission to copy without fee all or part of this material is granted 
provided that the copies are not made or distributed for direct 
commercial advantage, the ACM copyright notice and the title of the 
publication and its date appear, and notice is given that copying is by 
permission of the Association for Computing Machinery. To copy 
otherwise, or to republish, requires a fee and/or specific permission. 

s 1981 ACM 0-89791-043-5/81/0600/0022 S00.75 


an integrated tool capable of at least the three editing tasks 
mentioned above has been presented [BUR80]. 

When designing such an editing system, there are a few things 
not to be forgotten. Text editing will usually be the most 
frequent task, so it has to be very efficiently done. The users 
model must be kept simple. Otherwise we may have a nice 
concept but no users for our tool. 

Text editors have been used for nearly thirty years and are today 
probably the most important interface to the computer system. 
Though the first editors were very simple, often run through 
cards, a rapid development took place as terminals were put on 
line. Searching and substitution operations were improved and 
macro facilities included. Indeed, by the mid sixties, some text 
editors had full programming language capabilities [DAM71]. 

An early major effort treating problems addressed in this paper 
was the big NLS-project [ENG68J Here benefits came from 
taking the inherent structure of texts into account. This gave the 
user several view options of a document. He could e.g. choose to 
display just the skeleton, perhaps the section headers, or look at 
everything in some paragraph. Actually much more come out of 
this. It was the embryo of what should later be called an office 
information system. 

The best editors on the market today are streamlined to run on 
video terminals. They are screen-oriented i.e., the current part of 
the text is automatically kept on screen and every change on the 
screen immediately updates the mirrored text. Although not 
having structural aids as indicated above, they are aware of 
concepts as paragraphs, sentences and words. 

Up to this, editors had been line-oriented or treated the text as 
string of characters. The former being very convenient for 
program editing tasks, we believe that the latter gives more flexi- 
bility for working with ordinary texts. 

The design of the editor command language is of course a very 
crucial problem. One argument for natural-language-like 
commands, may be that these are easier to learn [LED80]. 
However, we stress it again, editing is such a common task that 
the commands must be very terse, keeping typing overhead at a 
minimum [CAR80], Further, designing the command interface 


22 



for a universal editor, the question of proper integration arises. It 
is then e.g. important to have a carefully selected set of 
commands, which are uniformly applicable to different kinds of 
data objects with a reasonable semantic interpretation. 

2. THE ED3 EDITOR. 

The design and implementation of the ED3 editor was the first 
step in an iterative design process, aiming at a fully generalized 
editor. Considering this version mainly a test bed for testing the 
viability of our design decisions, we confined ourselves to build a 
structure-oriented text editor. 


2.2 Structure Presentation 

The terminal screen is used to display the structure of a tree or 
the text in a node. If the current node is a tree node the first line 
of the head and the first line of each subnode is showed The 
subnodes are numbered and these numbers can be used as 
commands or as argument to a command 

If current node is a text node the whole text is displayed The 
screen is updated after every change in structure or text, or if 
another node is selected as new current node. 


; tasks 


things 
’ most 

l 

' users 
i nice 

I 

I 

l 

• today 
tystem. 
rough 
out on 
■d and 
sxt 


paper 
i from 
ive the 
oose to 
look at 
out of 
office 


un on 
oart of 
on the 
;h not 
are of 


text as 
nt for 
5 flexi- 


a very 
ge-like 
ED80], 
sk that 
id at a 
cerface 


The idea of such an editor is to superimpose a tree structure onto 
the text Drawing analogies from books this tree will act as a table 
of contents. There are however differences. Changes in the tree 
structure will cause the text parts to be correspondingly reordered. 
Selection of text part to be edited is made by traversing the tree 
structure (the table of contents) using menus automatically pro- 
duced from the current structure, and then entering text editing 
mode when a leaf is reached. 

The ED3 concept is put to work through cooperation of a tree 
editor and a text editor. ED3 gets its special flavor from the selec- 
tion of command languages and its presentational aids. These and 
related things will be discussed below. 

2.1 Tree Structure 

The present version of EDS has two types of nodes, tree nodes 
and text nodes. Other node types has been added in the experi- 
mental version for interactive graphics as described in section 4.1 
below. It is our intention that all structure editing commands shall 
be independent of the type of the objects in the tree. It can be 
texts, figures or data records. 

A text node is a sequence of an arbitrary number of characters. 
All 128 codes from the ASCII character set are allowed. 

A tree node is represented as a list containing a head and a 
sequence of zero or more references to other nodes, its subnodes. 
The subnodes can be of different types. Some can be text nodes 
and some can be tree nodes. If a subnode is a tree node it is the 
root of a subtree. 

In ED3 the node head is a text node. This makes it possible to put 
arbitrary information in the head. Later versions of ED3 will 
support different applications by having control information in 
the tree nodes which is not part of the text For example node 
names, formatting commands for the subtree, comments about the 
text etc. can be useful in a text processing system. 

How the structure is is used depends on the application. This po- 
int is further explained in section 4. For text processing it can be 
appropriate to have a paragraph in each text node. The tree 
structure then shows how these form sections and chapters. The 
node heads can contain headers and formatting commands. 

The next sections describes operations for presentation and modi- 
fication of structural relationships. The examples describe tree 
nodes and text nodes but there is no difference in the structure 
editing depending on whether the leaves in the tree are text nodes 
or something else. 


ED3 - User’s Guide... 

< — Current node 

1. Introduction... 

« — It’s four sub- 

2. Tree Structure... 

nodes 

3. Tree Ed i tor . . . 


4. Text Edi tor. . . 


>5 

« — Previous command 

> 

< — Command l ine 

?N0 SUCH NODE 

• — Error message 


Fig 1. Example of screen layout 

All commands to the tree editor are relative to the current node 
and this is always the node that is showed on the screen. 
Commands to visit different parts of the tree is actually 
commands that changes current node and this way changes the 
editor’s attention to another part of the tree. 

Tree traversal is done by menu selection. An integer as a 
command selects the corresponding subnode as current node. 


Tree Ed i tor. 


i. 

Commands 

that shows stru... 

2. 

Attent i or 

changing comma. . . 

3. 

Commands 

that modi fy str. . . 

4. 

Commands 

to move and cop. . . 

5. 

Commands 

to read or writ... 

6. 

Commands 

for program f i l . . . 

— nORE— 


>5 



>3 




Fig 2. Subnode three is selected 

When a node has many subnodes it is not possible to show all the 
subnodes. The same problem appears if the current node is a text 
with many lines. In these cases the text --MORE— is written on 
the screen. To see the next page the space bar is used The screen 
in the examples has only 11 lines. A real screen usually has at 
least 24 lines. 

Only a few commands are needed to make it easy for the user to 
walk around in a tree and get a general view of a structured 
document There are commands to select the nth subnode (integer 
n>0), go up to the root node ( A ), go one level up (0), go the next 
(NX) and the previous (BK) node on the same level in the tree. 
The idea and the names of these commands comes from the 
structure editor in the INTERLISP programming system [TEI78] 
[SAN78], 


; 

1 

: 

: 


23 







In addition to the automatic display of one level of sub nodes, 
there is a command to show several levels of the tree (P). This 
command normally displays three levels of the current subtree, 
but the print level is a parameter that can be changed The 
indentation of node numbers is used to indicate how the tree 
nodes are connected 


ED3 - User’s Guide... 

1. Introduction... 

2. Tree Structure... 

3. Tree Ed i tor . . . 

1. Commands that shows ... 

2. Attention changing c... 

3. Commands that modify... 

4. Commands to move and... 
__M0RE— _ 

> A 

>P 


Fig 3. The P Command 

If the current node is a text node the whole text is displayed 
Sometimes the user wants to see all the text in a subtree. The 
command for this (;P) suppresses all information about the tree 
structure and shows what the text would look like if all the nodes 
in the subtree where merged 

2.3 Structure Editing 

All structure editing commands are relative to the current node 
and therefore the commands delete a subtree (D), insert a node 
node after (IA) or before (IB) current node and to join current 
node with next (J) are commands without any parameter. The 
command to change order between subnodes node (SW) is one of 
the two commands where more than one node have to be 
specified 


ED3 - User’ s Guide 

1. Introduction... 

2. Tree Structure. 

3. Text Ed i tor. . . 

4. Tree edi tor. . . 


>SU 3 4 


Fig 4. Switch subnodes three and four. 

In order to move or copy a subtree two points in the tree must be 
specified These points can be far from each other. This problem 
-is solved by having two trees for a while. First the current subtree 
is moved or copied to the second tree. Any sequence of attention 
changing commands can then be used to reach to position where 
the saved tree is to be inserted This means that two local opera- 
tions in performed instead of one referencing two different points 
in the tree. 


The creation of new tree nodes is done by grouping the nodes 

that shall be its subnodes. There is also a command to split a tree 
node into its subnodes. The edit command (E) is used to change 
the information in a node. Depending on the type of current node 
the appropriate editor is used For each node type in ED3 there 
must be an editor, like the text editor for the text nodes. The head 
is edited if current node is a tree node. 

[All tree editing commands are listed in appendix.] 

There is also a set of commands to communicate with the file 
system. The whole tree or a subtree can be saved on files with or 
without the tree structure. The structure is omitted if the purpose 
with the file is to communicate with other programs. Files with 
structure can later be loaded as a subtree at any position in tree. 
Files without the structure information can be loaded as a text 
node. The user can then split the text and build a structure over 
it 

2.4 The Text editor 

The text editor is screen-oriented, having a part of the text 
presented in a window, and makes extensive use of control 
characters. Most editing is done by moving around in this 
window, automatically inserting normal characters at the place of 
the cursor. However, e.g. string replacement, is not conveniently 
expressed in this mode and therefore it is made possible to 
change to command-mode , where the command is visualized on a 
dedicated row. 


The present version of ED3 
has two types of nodes, tree 
and text nodes. _ 


*sf oo$ 

7SEARCH FAILED: ’ f oo’ 

>5 

>E 


Text window 
Current pos i t i on 


Command l i ne 
Error message 


Fig 5. Example of screen layout 

The text editor has two commands to create new text nodes, one 
manual and one more automatic. The "new node"-command 
(control-N) splits the text at current position and inserts the first 
part of the text before current node while the second part is kept 
in the text editor. A sequence of text nodes is therefore created if 
the user just enters the text and gives the control-N command 
when a new node is wanted. The "guess'-command (G) on the 
command line is used to divide a text node depending on the 
characters in the text The text is divided after empty lines or 
form feed depending on a parameter. This command is useful if 
a file without structure have been loaded. 

3. IMPLEMENTATION ISSUES 

Our first intention with EDS was to design and implement a 
system containing text buffer and file management and the 
functions that operate on the tree structure. This basic system was 
used for experiments with different command languages, presen- 
tation of structure and text The text editor and better algorithms 
for screen update were added later. 

Since Pascal is available on many small computers it was selected 
as implementation language to obtain portability. Only a small 
number of low level input and output functions are written in 
machine language. 


24 





e nodes 
it a tree 
change 
nt node 
'3 there 
le head 


the file 
with or 
purpose 
les with 
in tree, 
s a text 
re over 


:he text 
control 
in this 
place of 
eniently 
sible to 
ed on a 


i t ion 


ige 


des, one 
ommand 
the first 
t is kept 
reated if 
ommand 
•) on the 
r on the 
lines or 
useful if 


lement a 
and the 
stem was 
, presen- 
gorithms 

s selected 
a small 
vritten in 


ED3 was developed on a DECsystem-20 computer and is now used 
on DECsystem-20 and DECsystem-10. Recently it has been moved 
to PDP-11 under the RSX-11M operating system and work with an 
implementation under UNIX has begun. The next three sections 
describe the design of some time and storage critical parts. 

T ext buffer 

All text nodes is referenced from a tree node or is on the free list 
All texts are in one big buffer and a text node contains the length 
of the text and a pointer into the text buffer. Since texts of 
variable length are inserted, deleted and changed an efficient way 
to compact the buffer and reclaim unused space is needed. 
Therefore are all text nodes in the tree are also linked together in 
the same order as they have space allocated in the buffer. 

text nodes text buffer 



Fig 6. Text nodes and text buffer. 

The algorithms for buffer management are simple. New text 
nodes are always added in the end of the buffer. If there is not 
free space enough the buffer is compacted. Before editing a text 
it is copied to the end of the buffer. This makes it possible for 
the text to grew and it also gives the user a chance to get back the 
old version of the node. To delete a text node it is just to remove 
it from the list of active texts and put it on the free list 

The algorithm to compact the buffer is also simple. The list of 
active text nodes is went through once and the texts in the buffer 
are moved so they follow direct after each other. 

On Dec-20 we rely on the efficient paging mechanism in the 
operating system and have the whole text buffer in virtual 
memory. On mini computers we use a file for text buffer and 
have two or more smaller buffers in primary memory. 

T ree nodes 

A tree node is implemented as linked list The first list element 
contains a pointer to the head followed by a list of zero or more 
subnodes. Each list cell contains two pointers and a type tag. 



(text or 
tree node) 

Fig 7. A tree node with three subnodes. 


Storing tree structures on files 

A tree structure is stored on a sequential text file with the struc- 
ture written with parentheses representing tree nodes and the struc- 
ture of characters in text nodes written as decimal integers. After 
the structure all texts in the tree are written in depth first order. 


Tree: Corresponding text file: 


A (13 2) ABCDEF 

1. BCD 

2. EF 


HEAD (4 (4 7 7) 4) HEADSUB1SUB1ASUB2BSUB2 

1. SUB1 

1. SUB1A 

2. SUB1B 

2. SUB2 

Fig 8. External format of two simple trees. 

This external format for the structure is used to make the loading 
fast The tree structure is built when the structure part of the file 
is read Each new text node points to the place in the text buffer 
where the text later will be loaded After the structure part is read 
the rest of the file can be moved directly into the text buffer. 
Other file formats handled by ED3 are program files where the 
structure information is stored in comments and text files without 
structure. 

4. EXPERIENCES WITH THE EDITOR. 

In this section we will summarize the present status of the EDS 
implementation and indicate the directions of extensions planned. 
In addition we will substantiate the usefulness of the EDS 
approach to editing by presenting a number of typical applica- 
tions of EDS, as it has been used within our lab. 

4.1 Implementation status. 

As was detailed in the previous section, EDS is presently running 
on DECsystem-20, DECsystem-10 and PDP-11. Users are still 
mainly staff at our lab, but the system is also available on some 
other computers in the university environment 

There is also an experimental implementation of a version of 
ED3 containing support for picture management Node types 
have been added for the graphic information. A picture node in 
ED3 is a tree structure built of subpictures. For each subpicture 
position, scaling and rotation is given. These parameters for the 
subpicture are defined relative the level above it and are inher- 
ited down the structure. Each subpicture can either be defined in 
terms of points and lines or be a picture node with own 
subpictures. Libraries with standard picture elements can be 
loaded and the same picture element can be referenced several 
times with different scaling and orientation. When a document 
containing graphic nodes is edited on an ordinary text terminal, it 
is of course impossible to study the pictures, but the user can still 
move them, see the name of the subpictures etc. This can be 
important if there are many text terminals, but only a few 
graphic ones. 

The present work is performed along several lines of 
development Enhanced text editing facilities are considered for 
the ED3 prototype and at the same time are the proper implemen- 
tation decisions for a mini, i.e. PDP-11 worked out At the same 


i 

i 


25 











time are the experiments with the design of graphics primitives 
carried on in a parallel implementation. The major extensions of 
highest priority are handling of formatted data records within 
ED3 and also the handling of references to external data, such as 
e.g. pointers to ordinary files or database retrieval requests as 
additional node types in an ED3 tree. 

4.2 Application scenarios. 

Below we will give a short account of some typical tasks which 
ED3 has been used for so far. We feel that these examples 
provide a spectrum of applications, which display the appro- 
priateness of combining a powerful text editor with a hier- 
archically structured interface, as realized in the ED3 system. 

Preparation of structured documents 

The most immediate application of ED3 is for writing large text 
documents in a structured fashion. The natural structure of such 
a document may be its division into chapters and sections, etc., but 
different conventions may be appropriate depending on the 
current application. Although this kind of structured writing and 
maintenance of a large document may be realized in a reasonable 
way with the help of the ordinary file system and some simple 
editing conventions, it is however important to remember that 
ED3 was conceived as a prototype for an editor with the capability 
to handle pictures and formatted database records as parts of a 
document as well. 

Thus the main advantage with the ED8 approach in this connec- 
tion is the possibility to integrate text preparation with interactive 
graphics editing and production of database reports. The basic 
primitives for handling the structure part of an ED3 file were 
chosen in order to provide a common uniform user interface for 
accessing and manipulating leaves containing text, pictures or 
data. 

Personal database 

In an environment where computers are readily available as 
personal tools, as being envisioned in many application areas 
today, one important service is the maintenance of files of 
personal notes and memos. Conventional computer systems pro- 
vide this kind of services essentially through the operating 
systems file directory, using text editing for creation of files as the 
unit of entries in the personal information store. In addition the 
user may have at his disposal a database facility with an 
interactive query (and data manipulation) interface. The latter 
type of tools are however often heavily oriented towards rigidly 
formatted data records, imposing severe limitations on their use- 
fulness for the purpose of managing personal notes and 
documents. 

The ED3 approach exhibits a convincing appropriateness for 
this type of filing of personal notes and memos. For instance, we 
often use it to keep files with literature references, to be used for 
e.g. compilation of bibliographies or reference lists in research 
papers. The tree then provides a structuring on subjects of the 
references and access paths for selection of leaves appropriate for 
a specific purpose. 

It should be mentioned that computer mail systems (see e.g. 
[SAN80]) often provide this kind of database service also. An 
advantage with the ED3 approach is that the hierarchical struc- 
turing and the easy-to-use tree browsing mechanisms significantly 
improves the possibilities to manage larger collection of messages 
or notes. 


Directory for short files 

A notorious problem for people occupied with program develop- 
ment or text processing, is to organize the large number of files 
which is typically accumulating. Although the operating system 
usually provides cataloguing facilities, these are often not suffi- 
cient for giving reasonable identifying names to each file in a 
large collection. In addition, most operating systems do not pro- 
mote the usage of many short files, but impose an overhead in 
the form of fixed space allocations or restrict the number of 
entries allowed in the catalogue structure etc. 

ED3 provides a practical solution to this problem. It is in our 
experience very convenient to use ED3 as an interactive interface 
for access to collections of text files. The tree searching facilities 
with automated support for menu selection (with one full text line 
identifying the contents of the subtree) makes navigating in the 
file collection a very fast process. Retaining very short files is 
thus not penalized in ED3, due to its support of variable-size text 
nodes. 

Writing programs 

The structuring primitives of ED3 are well suited for handling 
the source text of large program systems. Although it is in prin- 
ciple possible to break down a program into its constituent blocks 
and statements it is usually more convenient to organize the tree 
structure from packages, modules, procedures and so on, retaining 
the fundamental simplicity of pure text editing for program text 
blocks of a reasonable size. 

In fact ED3 contains some commands which directly supports 
program development Thus there are commands for creating a 
sequential text file from the tree representation within EDS, in 
such a fashion that this file may be processed by the regular 
programing language compiler (Pascal and Simula at present) 
and that the structure is still indicated in the text file. That is, by 
having comments of a certain appearance, the tree structure may 
be reconstructed from the text file. This also means that the struc- 
ture is preserved even if the file is formatted with a pretty 
printer. 

Documenting system maintenance 

Maintaining a software system involving a number of program- 
mers is usually a very delicate problem. Especially in applications 
where end-user enhancements or modifications are often called 
for, or when several programmer shares the responsibility for 
maintenance. 

ED3 has been tried as a tool for documenting system maintenance 
for an experimental CAI system , containing about forty 
packages, averaging on ten to fifteen modules. The possibility to 
substitute a structured organization for the previously used 
sequential maintenance log files was experienced as a considerable 
improvement. The structure is here used for selection of a speci- 
fic package and module within the package. For each module 
may then subnodes be used to distinguish between documentation 
of code corrections, general comments, error reports, enhancement 
suggestions, etc. 

Compiling standard documents 

In many office applications similar documents are produced over 
and over again, with small but significant differences from one 
document to another. Consider e.g. business correspondence or the 
preparation of contracts and tenders, where the main structure of 
the document is often fixed and where but a few variations of 
standard formulations occur for many of the paragraphs. 


26 


evelop- 
)f files 
system 
suffi- 
e in a 
t pro- 
^ad in 
3er of 


in our 
cerface 
icilities 
;xt line 
in the 
files is 
ze text 


.ndling 
i prin- 
blocks 
he tree 
lining 
:m text 


ipports 
iting a 
D3, in 
egular 

1 resent) 

. by 
may 

2 struc- 
pretty 


jgram- 
:ations 
called 
ity for 


enance 
: forty 
ulity to 
y used 
ierable 
speci- 
module 
ntation 
cement 


?d over 
:>m one 
* or the 
tore of 
:ons of 


In this case it is straight-forward to have the main structure of a 
standard document represented in an EDS file, with text leaves 
containing standard or selectable text fragments. Compiling a 
document is then done by pruning the tree of those branches, 
which are not applicable for the task, at hand and then use text 
editing to enter specific information needed for the document 
instance. 

In this connection we have felt the restriction to pure hierarchies 
in the present system as an inconvenience, since the same texts 
may have to be redundantly stored in several leaves of the tree. 
The treatment of documents as graph structures, as is done in 
[FEI81], should be more appropriate. 


5. SUMMARY AND CONCLUSIONS. 

The ED3 structure-oriented text editor was designed as a prototype 
for an editor providing a uniform interface to text, pictures and 
formatted database records. This paper describes the implemen- 
tation of the structure handling mechanisms and the 
screen -oriented text editing primitives, and mentions some fea- 
tures of an experimental extension of ED3 for graphics editing. 

Our experiences of the implementation prove that in addition to 
being a powerful text editor, the hierarchical structuring of text 
objects provides qualities for information handling of a kind 
which is usually associated with database management facilities 
rather than text editing. Examples were drawn from areas such as 
personal databases, program development, documentation of 
system maintenance, document preparation etc. Still we feel that 
adding the possibility to handle formatted data records is an 
important next step. 

One limitation, which has been felt in the present implementation 
of ED3, is the possibility to process the leaves in an ED3 structure 
as one consecutive text file, i.e. simulating the semantics of a 
conventional text editor. Although there are some problems in 
connection with modification operations across node borders, we 
foresee no special difficulties in supplying a " file text editing 
mode" in addition to the present text editing of one node at a time. 

We have experienced the flexible hierarchical structuring mecha- 
nisms of ED3 as a very useful tool for organizing and main- 
taining larger text files or data collections. By observing a little 
discipline in the allocation of text to nodes (first line should be 
informative, since it will be used when the tree structure is 
displayed), a tree browsing interface based on menu selection is 
automatically supplied, which significantly improves the access to 
node data. 

Acknowledgments. 

The work with EDS is supported by the Swedish Board of Tech- 
nical Development We have received advice from Christian 
Gustafsson, who also participated in the initial design. Store 
Hagglund constantly supported us during writing this and is as 
well the recognizer of various application areas. 


REFERENCES: 

[BUR80] Burkhardt, H., and Nievergelt, J. Structure-oriented 
editors. Berichte des Instituts fUr Informatik Nr. 38, 
ETH, ZUrich, (1980). 

[CAR80] Card, S. et ol. The Keystroke-Level Model for User 
Performance Time with Interactive Systems, Comm. 
ACM, vol 23, no 7, pp 396-410, (July 1980). 


[DAM71] van Dam, A. and Rice, D.E., On-line text editing: A 
Survey, Comput. Surv. vol. 3, no 3, pp 93-1 H 
(September 1971). 

[ENG68] Engelbart, D and English, W., A research center for 
augmenting human intellect. Proc. F JCC, vol. 33, Ptl, 
AFIPS Press, Montvale, N.J. , pp 395-110, (1968). 

[FEI81] Feiner, S. et ol. Online Documents Combining 

Pictures and Text Presented at the International 
Conference on Research and Trends in Document 
Preparation Systems, (February 1981). 

[FRA80] Frazer, C.W., A Generalized Text Editor, Comm. 

ACM, vol 23, no 3, pp 154-158, (March 1980). 

[LED80] Ledgard, H et al. The Natural Language of 

Interactive Systems, Comm. ACM, vol 23, no 10, pp 
556-563, (October 1980). 

[SAN78] Sandewall, E., Programming in an Interactive 
Environment the Lisp Experience. ACM Comp. 

Surveys, vol. 10, no 1, pp 35-71, (1978). 

[SAN80] Sandewall, E. et al. Provisions for Flexibility in the 
Linkoping office information system (LOIS). 
Proceedings of the 1980 NCC Conference, (1980). 

[TEI78] Teitelman, W., The INTERLISP Reference 
Manual. Xerox PARC, Palo Alto, Calif, (1978). 


APPENDIX: TREE EDITOR COMMANDS 


n 

0 

z 

NX 

BK 

P 

H 

? 

Q, 

R 

D 

E 

j 

SW m,n 

BI m,n 

BO 

C 

X 

G 


I 


Return to root node 

n>0. Go down to nth subnode 

Go up one level 

Go down to last subnode 

Go to next node 

Go to previous node 

Show structure of current subtree 

Show header of current subtree 

Show current edit chain, Le. the path from the root node 
Show structure of tree saved in register 
Rewrite current subtree (with print level 1) 

Delete current subtree 

Edit current node (text node or head of tree node) 

Join current node with next 

Switches the mth and nth subnodes 

Make a new tree node of nodes m thru n 

Split current node into its subnodes 

Save a copy of current subtree in register 

Move current subtree to register 

GA Insert subtree from register after current node 

GB - " - before 

GN - " - after last subnode 

IA Insert new text node after current node 

IB - " - before - “ - 

IN - ” - after last subnode 


27 



The Design of a Language-Directed Editor for 
Block-Structured Languages 



w 


Joseph M. Morris 
Mayer D. Schwartz 

Applied Research Group 
Tektronix Laboratories 
Tektronix, Inc. 


ABSTRACT 

A language-directed editor combines 
the text manipulation functions of a 
general-purpose editor with the syntax- 
checking functions of a compiler. It 
allows a user to create and modify a pro- 
gram in terms of its syntactic structure. 
The design of a user interface and an 
implementation for one such editor is 
described in language-independent 
terms. The design rationale is given. The 
implementation is outlined in terms of its 
major data structures. 

1. INTRODUCTION 

The advent of powerful personal 
computers makes feasible highly interac- 
tive display-oriented tools for software 
development. An important program- 
ming tool in this class is a language- 
directed editor. It combines the func- 
tions of a regular editor with those of a 
parser, allowing the user to create and 
modify programs in terms of syntactic 
structure. Language-directed editors 
have been supporting the LISP program- 
ming community for many years, but 
only now are they being developed for the 
more difficult case of block-structured 
languages; this difficulty arises from the 
greater syntactic and semantic richness 
of such languages. 


Permission to copy without fee all or part of this 
material is granted provided that the copies are 
not made or distributed for direct commercial 
advantage, the ACM copyright notice and the 
title of the publication and its date appear, and 
notice is given that copying is by permission of 
the Association for Computing Machinery. To 
copy otherwise, or to republish, requires a fee 
and/or specific permission. 


* 1981 ACM 0-89791-043-5/81/0600/0028 $00.75 


This paper describes the design of a 
language-directed editor for Pascal pro- 
grams. Both the user interface and the 
implementation of the editor are largely 
language-independent, and the design 
readily carries over to other block- 
structured languages such as C, Ada, and 
Algol 60. The presentation will concen- 
trate on those design and implementa- 
tion issues peculiar to language-directed 
editing; elements common to general- 
purpose editors will get briefer mention. 

The design to be presented in Sec- 
tion 3 is interesting in several respects. 
Its language-independence has already 
been mentioned. Text is entered conven- 
tionally character by character ar.d is 
parsed token by token as it is being 
entered. The text may deviate from the 
syntactic rules to the degree necessary 
to permit editing flexibility, yet it retains 
sufficient syntactic integrity for the 
language-directed commands to be 
meaningful. 

The earliest examples of language- 
directed editing were for LISP; Sandewall 
[San70] describes the most powerful of 
these interactive programming environ- 
ments. A display-oriented user interface 
to one of these LISP systems has been 
implemented on a personal computer 
[Tei77], Editors specifically for block- 
structured languages are included in the 
Cornell Program Synthesizer for PL/CS 
[TeR80], and the Gandalf project for Ada 
at Carnegie-Mellon University 

[Hab79,FeM8l], ALBE/P is an editor for 
Pascal Programs [LeP79]. Related work 
on tree manipulation includes the MEN- 
TOR program manipulation system at 
INRIA [DHK79], 

The following is an outline of this 
paper: Section 2 describes language- 
directed editing in general, including its 
advantages (Section 2.1), and its disad- 
vantages (Section 2.2). Section 2.3 
describes the conventional user interface 
model and its shortcomings. Section 2.4 

28 


describes a new model based on the 
relaxation of strict syntactic validity. 
Section 3 gives an overview of a user 
interface and its implementation for a 
language-directed editor based upon the 
new model. Section 3. 1 presents the user 
interface. Section 3.2 outlines the parsing 
scheme, and Section 3.3 describes the 
organization of the program text. Sec- 
tion 4 makes some concluding remarks. 

2. LANGUAGE-DIRECTED EDITING 

In the following, the program text 
being edited will be referred to as the 
“object text". Most language-directed 
editors are screen-based; that is, the 
display screen of the terminal serves as a 
window onto the text. One inserts or 
modifies the text at the location of the 
display cursor, and the screen display is 
updated immediately to reflect the new 
state of the text. The text is automati- 
cally scrolled up or down through the 
display “window” as the cursor is moved 
about. 

2.1 Advantages 

Language-directed editors are 
claimed to allow more productive editing 
of program texts than general-purpose 
editors. They reduce typing effort by 
providing abbreviations for frequently- 
occurring text elements such as key- 
words, and by taking care of text- 
formatting. Moreover, because the editor 
commands can be tailored to the syntac- 
tic structure of the text, the user can 
edit a program with greater economy. 
For example, a single command will 
locate the next occurrence of a variable 
x, ignoring occurrences of x as a string 
in constants or comments. 

Language-directed editors derive 
much of their increased effectiveness 
from incorporating the syntax-checking 
component of a language compiler - this 
can include all the “static semantic” 
checking normally carried out by a com- 
piler, such as checking for type compati- 
bility and agreement of actual and formal 
parameters. Text can be checked for 
syntactic correctness as it is being 
entered, and syntax errors immediately 
detected. The traditional cycle of editing 
and compiling, repeated many times until 
the program is free of "compile-time" 
errors, can be replaced by a single ses- 
sion with a language-directed editor. This 


is especially beneficial in a learning 
environment, because the editor's 
response to erroneous input might well 
include the display of a list of syntactic 
constructs that can properly be entered 
at that point. Such immediate response 
to errors can decrease the time to learn 
the rules of the language. 

Finally, a language-directed editor 
can form the basis of an integrated set of 
programming tools to support all phases 
of program development. The editor 
represents the syntactic structure of the 
program text in a parse tree, and this 
tree can subsequently be the subject of 
an interpreter, debugger, and/or code 
generator. Ultimately, compilation as a 
separate and distinct step can be elim- 
inated. 

2.2 Disadvantages 

Two salient disadvantages attend 
language-directed editors. First, they 
are profligate consumers of computer 
resources, to the extent that they are 
usually designed for single-user systems 
or multi-user systems carrying a low 
interactive load. Parsing consumes pro- 
cessing power, the parse tree devours 
storage, and there is no solution but to 
supply plenty of each. 

Their second disadvantage is a 
consequence of the tight syntactic con- 
straints they impose on the object text. 
In transforming a legal program P into 
another one P’, the shortest sequence of 
editing operations might well take the 
object text through syntactically illegal 
states. For example, consider transform- 
ing the while-statement 

while a > b do a := a - c; 

into the corresponding if-statement 

if a > b then a := a - c; 

or, as another example, bracketing a 
pre-existing sequence of statements with 
begin and end. As soon as a single char- 
acter of the reserved words is altered, 
the syntax rules are violated. While it 
should be possible to violate the syntax 
rules to the degree necessary for editing 
flexibility, the text must retain sufficient 
syntactic integrity for the editor to merit 
the name of “language-directed". This is 
a difficult design problem, but although it 
is as intrinsic to language-directed edit- 
ing as the previously mentioned 


29 


2.4 A New Model 


resources problem, it is not as intract- 
able. The currently preferred solution is 
described in the next section. It is not 
wholly satisfactory, however, and a new 
solution will be offered. 

2.3 The Conventional Model 

As mentioned above, the first design 
problem confronting the creator of a 
language-directed editor is that of main- 
taining the syntactic structure of the 
object text, and the second is to balance 
this obligation with the need for editing 
flexibility. Language-directed editors 
typically resolve this conflict with a 
scheme which may be called "the tem- 
plate model”. The syntactic structure of 
the text is represented by a parse tree. 
Text is entered by first creating a tem- 
plate - a skeleton of some syntactic unit 
- and then filling in the details. For 
example, to insert a while-statement at 
the position of the cursor, the user might 
first type a command to insert an empty 
template into the text, and its syntactic 
structure into the parse tree. When first 
entered into the text, the template has 
the form 

while <boolean-expression> 
do <statement> ; 

the user will then enter the boolean 
expression character by character (caus- 
ing the prompt ‘‘<boolean-expression>" 
to disappear), and will go on to develop 
the <statement>. The boolean expres- 
sion will usually be checked for validity 
only when it has been entered in full. 
Thus, the text will always be a complete 
and correct "shell” of a program, but 
some of the details may be missing or 
erroneous. Templates are generated by 
command, while expressions and assign- 
ment statements are entered character 
by character. (Refer to [TeRSO], [FeM80] 
or [LeP79] for a more complete descrip- 
tion of editors that operate on the tem- 
plate model.) Although this scheme is 
relatively simple to implement, it results 
in an inflexible interface. To return to an 
example of the previous section, the 
transformation of a while-statement into 
the corresponding if-statement is carried 
out by first inserting a template for the 
if-statement. then moving the consti- 
tuents of the while-statement to the 
corresponding places in the if-template, 
and subsequently deleting the while- 
statement. 


A different model, which seeks to 
overcome the inflexibility of the template 
model, is proposed. In this model, all 
text is entered character by character, 
as with a general-purpose editor. Text 
may be altered at will, provided it obeys 
what will be called a “language con- 
straint” - the text up to the token 
immediately preceding the cursor must 
be the legal beginning of a Pascal pro- 
gram, and the remaining characters 
preceding the cursor must constitute a 
legitimate beginning of a token. The user 
enters text just as with a normal screen- 
based editor, but on making a syntax 
error is immediately notified. An error 
need not be corrected immediately, but 
until it is, the cursor is confined to the 
text at or preceding the error. Observe 
that a second error might be introduced 
in the text preceding the first one, and so 
on. Nevertheless, the language con- 
straint ensures that any region of text 
preceding the first error is syntactically 
correct and can be edited. Moreover, the 
editor has sufficient flexibility for the 
user to carry out modifications to the 
object text by the shortest route. To 
take an earlier example, a while- 
statement is converted to a correspond- 
ing if-statement simply by editing left to 
right character by character as with a 
general-purpose editor. 

The language constraint is practica- 
ble only if it can be realized by an 
efficient incremental parsing algorithm, 
i.e. a parsing algorithm which can accom- 
modate changes to the object text with a 
minimum of reparsing of existing text. 
Such a parsing scheme, which depends on 
maintaining not one parse tree but a 
sequence of parse trees, is described in 
Section 3.2. 

3. A LANGUAGE-DIRECTED EDITOR BASED 
ON THE NEW MODEL 

3.1 User Interface 

The design of any user interface 
must be approached more or less simul- 
taneously from two almost opposite 
directions. Viewed from one side, the 
interface must be pleasant to use; from 
the other, it must allow a reasonable 
implementation. Interfaces which are 
functionally similar and equally pleasant 
may impose widely differing demands on 


30 


the implementor, so one must be careful 
not to make his task needlessly difficult. 
For purposes of this presentation, the 
pleasantness and implementation aspects 
are sharply separated. 

The editor accepts Pascal text, 
character by character, as described in 
Section 2.4, and formats the text to 
reflect its syntactic structure. The 
design of this formatting function has 
been problematic. Wide differences in 
programmers’ tastes preclude the impo- 
sition of a fixed formatting standard, 
which is the simplest strategy to imple- 
ment. A compromise was made in which 
the user is expected to create the line 
breaks (by inserting carriage-return 
characters) and the editor takes care of 
the indentation of lines. The editor 
indents lines by prefixing them with a 
number of “indentation units”, the value 
of an indentation unit (in terms of char- 
acter positions) being under the control 
of the user. Such an auto-indenting func- 
tion is extremely complex to manage. 
Consider, for example, that it must re- 
adjust the indentation of lines of text 
moved to a new nesting level, and lines of 
text subsequently bracketed by a begin- 
end pair. Consider also that in some 
cases, the indentation of a new line can- 
not be known exactly until the first token 
has been typed - an example is the inser- 
tion on a new line of the final end in a 
begin-end block. To ease the 
implementor’s burden, the user does not 
have the freedom to insert blank charac- 
ters at the beginning of a line; even so, 
the auto -indentation algorithm remains 
complex. 

One advantage of the template 
model is the ease of entering reserved 
words. The user interface for the new 
model retains some of this advantage by 
offering a "generic key”. If enough of a 
reserved word has been entered to 
disambiguate it from other reserved 
words, then pressing the generic key is 
equivalent to entering the remainder of 
that reserved word. 

The editor includes the usual com- 
mands found in general-purpose editors, 
but with some distinctive qualities stem- 
ming from knowledge of the syntactic 
structure of the object text. Hence, the 
display cursor can be moved both by 
units of a single character and by syntac- 
tic units - a token, a statement (or any 


major syntactic unit such as a type- 

definition or variable-declaration), or a 

major syntactic unit at a similar nesting 
level. As another example where the 
structure of the object text can be used 
to advantage, there is a find, command 
which takes a parameter x, where x can 
be an identifier, a variable, an expression, 
a label, or a character string (suitably 
distinguished). This command locates 
the next instance of text similar to x: 
thus if x is a variable, the cursor will be 
positioned at the next occurrence of that 
variable. A related command is the 
replace command which takes two 
parameters x and y, it is equivalent to 
executing a find command, with x as a 
parameter, followed by replacing the 
instance of x thus found with y. It is 
therefore an easy task - in contrast with 
general-purpose editors — to rename the 
variables of a program. (Commands may 
be prefixed by repetition counts.) 

As in general editors, commands are 
provided to delete, move, or output a tex- 
tual extent; i.e., a continuous piece of 
text. The user delimits a desired extent 
by “marking" the first (cr last) character 
of the desired extent, and performing 
zero or more cursor movements to take 
the cursor to the final (or first) charac- 
ter. A delete, move, or write command is 
then issued to operate on the extent thus 
delimited. (A move command moves the 
textual extent to an anonymous file 
whence it can be reinserted into the text 
at any place.) Because the cursor move- 
ments are defined in terms of syntactic 
units, this scheme permits easy manipu- 
lation of the syntactic units of the object 
text, and requires just a small set of com- 
mands. 

The user interface has been 
designed to be as safe as possible: 
changes to a program are made visible on 
the screen; there is a command to undo 
the most recent continuous sequence of 
character insertions and deletions; and 
simplicity has been the touchstone 
throughout. 

3.2 Parsing 

Even though the editor is intended 
for Pascal programs, it has been designed 
to be as language-independent as possi- 
ble. This makes it relatively easy to pro- 
vide a set of editors for different block- 
structured languages, ail the editors hav- 


31 



mg similar user interfaces and implemen- 
tations. Formal parsing techniques con- 
tribute to language-independence, and 
have been employed in the design; the 
parsing algorithm used is an extension of 
LL(l) parsing (see [AhU72] for details). 

The term “parse tree” is used in its 
usual sense (see for example [AhU72]); 
each node is labeled with the appropriate 
terminal or non-terminal symbol. A 
parse tree includes, for each terminal, a 
set of pointers to the program text, del- 
imiting the terminal's textual represen- 
tation together with any associated 
separators. A terminal node is said to 
"cover" the text so pointed to. An inter- 
nal node covers the text which is covered 
by its sons. A parse tree pointer, which is 
a pointer to a leaf node of the first parse 
tree, is also maintained; it moves in lock 
step with the physical cursor on the 
screen. 

The editor's parsing algorithm 
extends traditional LL(l) parsing. The 
LL(l) parse tables are augmented by 
including column entries for both non- 
terminals and terminals. It will become 
clear shortly that this makes efficient 
parsing possible despite arbitrary inser- 
tions or deletions in the existing text 
(subject to the language constraint). In 
standard LL(l) parsing, the state of the 
parse is maintained by a stack and a 
pointer to the next token in the text. In 
the editor, a set of parse trees is used 
which represents the complete parse of 
the text. Because the underlying parsing 
algorithm is a top-down predictive 
method, the top element of the parse 
stack is indicated by the parse tree 
pointer and the next token is located at 
the display cursor. The parse tree 
pointer, in pointing to a (terminal or 
non-terminal) leaf node, predicts the 
legal set of acceptable tokens. 

A consequence of the language con- 
straint is that the syntactic structure of 
the text cannot necessarily be covered by 
a single parse tree, but must be covered 
by a sequence of parse trees, with adja- 
cent parse trees covering adjacent pieces 
of text. The dividing point between the 
text covered by one parse tree and that 
covered by the next parse tree is called a 
"discontinuity”; discontinuities are not 
apparent to the user. Initially, there is 
just one parse tree. During an editing ses- 
sion, however, parse trees may split or 


join as text is inserted, deleted, or the 
cursor moved forward. The requirement 
that the text cursor not be moved past a 
syntax error (a direct consequence of the 
language constraint) implies that the 
parse tree pointer is always at a node 
which is in the first parse tree. The root 
node of the first parse tree is labeled with 
the start symbol. 

Whenever a command is issued 
which would move the text cursor beyond 
the token immediately following the first 
discontinuity, the first two parse trees 
will be joined, if possible, without repars- 
ing. Denote by A, the label of the leaf 
non-terminal at the parse tree pointer, 
and by B, the label of the root node of the 
second parse tree. The two trees can be 
joined if A and B are the same non- 
terminal, or if A can derive a sequence of 
symbols having B as the first element. 
For this reason, non-terminals can 
behave as terminals, and the extended 
LL(1) parse table has non-terminal 
column entries as well as terminal 
column entries. If the two trees cannot 
be joined by any one of the above cri- 
teria, then the second parse tree must be 
split into smaller trees and the joining 
algorithm applied to the new second tree. 
This process continues until the trees are 
joined or until no further splitting can be 
accomplished, in which case an error is 
indicated. For example, suppose the cur- 
sor is immediately after a begin which 
has just been inserted between two state- 
ments; if the cursor moves forward into 
the second statement, there is no need to 
reparse that statement - its entire parse 
tree is merely inserted into the first 
parse tree. In some cases, as mentioned 
above, the first two parse trees cannot 
immediately be joined. For example, sup- 
pose that the text cursor is between "a 
:=" and the (previously parsed) 
procedure-statement “p(x )” . The parse 
tree representing “p(x)'' would have to 
be split into its component parts so that 
it could be reparsed as an expression. 
The extended LL(l) parsing algorithm 
incorporates in its parse table entries 
which determine when trees are to be 
split or joined. (The details of the 
extended LL(l) parsing algorithm, the 
splitting algorithm, and the calculation of 
the parse table entries will be presented 
in a subsequent paper.) 


32 


3.3 Object Text 

The editor must respond to user 
activity without noticeable delay. This 
means that in the perennial space-time 
tradeoff, more space is favored for less 
time. Along with the parse tree(s) of the 
program (being edited), the editor main- 
tains the complete object text; both are 
stored in a tailor-made virtual memory. 
The text is kept as a doubly linked list of 
fixed size records, organized into logical 
lines; a logical line consists of one or 
more of these records. Each logical line 
begins with an indentation, which is 
represented by a pair (c,ci) where c 
denotes the number of indentation units 
for the line and d is an adjustment, usu- 
ally zero. This allows changing the size of 
an indentation unit without changing any 
of the logical lines. Note that the inden- 
tation is only valid for logical lines 
covered by the first parse tree. For 
example, if a begin is inserted before a 
sequence of statements, those state- 
ments will only be given an extra indenta- 
tion unit as the cursor moves into them. 

Before text is output to any device, 
including the screen, each logical line is 
converted into one or more physical lines 
by expanding the indentation into blanks 
and, in the case of output to the screen, 
converting non-printable characters to a 
two-character representation. 

4. CONCLUSION 

An overview of the design of a 
language-directed editor has been 
presented, describing both the user 
interface, and the implementation stra- 
tegy. It is an interesting design on a 
number of counts. The user interface, 
apart from program text of course, is 
language-independent. Text is entered 
character by character, without the need 
for special commands, and is parsed 
token by token as it is entered. This 
style of input is both convenient and con- 
ventional, and leads to immediate 
notification of syntax errors. The user is 
granted the freedom to modify text by 
the most economical route, yet the text 
retains sufficient syntactic integrity for 
the language-directed commands to have 
meaning. Finally, the implementation is 
largely language-independent; the 
language-dependent components are lim- 
ited to the parse table, the scanner, and 
the line indentation function. It is still 


undetermined whether the design yields 
an acceptable response time - that ques- 
tion can only be answered empirically; for 
this reason, we are currently building a 
prototype of the editor. It includes all 
the features described and its completion 
is scheduled for early 1981. 

REFERENCES 

[AhU72] Aho, A.V., and Ullman, J.D. The 
Theory of Parsing, Translation 
and Compiling, Vol. I. 
Prentice-Hall, Englewood Cliffs, 
N.J., 1972. 

[DHK79] Donzeau-Gouge, V., Huet, G., 
Kahn, G., and Lang. B. The MEN- 
TOR program manipulation sys- 
tem. IRIA-Laboria, Aug. 1979. 

[FeM8l] Feiler, P.H. and Medina-Mora, R. 

An incremental programming 
environment. Proc. Sth Inter- 
national Conference on 
Software Engineering, 422-429, 
Mar. 1981. 

[Hab79] Habermann, A.N. An overview of 
the Gandalf project. Computer 
Science Review 1978-79, 
Carnegie-Mellon University, 
1979. 

[LeP79] Lewis, J.W. and Porges, D.F. 

ALBE/P: a language-based edi- 
tor for Pascal. Proc. Eighth 
Texas Conf. on Computing Sys- 
tems, Nov. 1979. 

[San78] Sandewall, E. Programming in 
the interactive environment: 
the LISP experience. ACM Comp. 
Sum/eys 10, 1 (March 1978), 35- 
71. 

[TeR80] Teitelbaum, T. and Reps, T. The 
Cornell program synthesizer: a 
syntax-directed programming 
environment. TR 80-421, Dept, of 
Comp. Sci., Cornell University, 
May 1980. 

[Tei77] Teitelman, W. A display oriented 
programmer's assistant. CSL 
77-3, Xerox Palo Alto Research 
Center, March 1977. 


33 



Etude and the Folklore of User Interface Design 

Michael Good 

Laboratory for Computer Science 
Massachusetts Institute of Technology 
Cambridge. Massachusetts 02139 


1. Introduction 

Research in user interface design is like the weather — 
everybody talks about it. but nobody does anything about 
it. While this isn’t strictly true, die great majority of 
guidelines for user interface design that one is likely to 
come across are based on the experience or gut feelings of a 
particular designer. These are better than nothing, but they 
are made less useful since 1) a particular recommendation 
could be based on factors unique to the designer's own 
system, 2) the population for whom die particular system is 
intended is either not described in detail or is not a 
generalizable sample of computer users, 3) designers’ gut 
feelings are notoriously inaccurate sources for human 
factors guidelines. Thus the main body of recommen- 
dations available to the designer of a new system is more in 
the category of folklore dian of readily accepted engi- 
neering principles. 

So long as one recognizes diese limitations, it is still very 
helpful to consider the recommendations one finds in the 
literature when designing a new system. Some of these are 
based on experimental evidence, while others are repeated 
often enough and with so little opposition diat their utility 
is better dian average. In this paper. I will show how these 
principles have been applied in designing the Etude text 
processing system. After summarizing the major ideas 
behind the design of Etude, I will focus on several specific 
areas of user interface design, comparing Etude’s approach 
with the appropriate recommendations from the folklore. I 
will conclude by briefly describing a forthcoming exper- 
iment which is intended to determine if adherence to the 
folklore has in fact produced a system that is easy to use. 

Permission to copy without fee all or part of this material 
is granted provided that the copies are not made or 
distributed for direct commercial advantage, the ACM 
copyright notice and the title of the publication and its 
date appear, and notice is given that copying is by 
permission of the Association for Computing Machinery. 
To copy otherwise, or to republish, requires a fee and/or 
specific permission. 

° 1981 ACM 0-8979 1 -043-5/8 1 /0600/0034 S00.75 


2. Etude 

The Etude interactive editor and formatter is the first 
component of nil integrated office work station being 
developed by the Office Automation Group of the MIT 
Laboratory for Computer Science. It is intended to run on 
a small, powerful computer system with a high-resolution 
bit-map display. The interactive nature of the system 
means that the user can create, edit, iuid formal a document 
at a terminal and see the results immediately displayed on a 
full page, high resolution display screen. Other tools, such 
as an image processing system, a database management 
system, and an electronic mail system, will be incorporated 
in an integrated manner so that the user will not be 
conscious of having to switch between systems in order to 
perform the necessary tasks. 

The design goals of Etude, as described in the specifications 
[15], are as follows: 

- The system should be very easy to learn to use. We 
expect that a person with no specialized training should 
be able to sit down in front of the system, and, with 
system prompts and queries, be able to create, edit, and 
print out a formatted document. At any time, if the user 
is unsure of what he can or should do, he may ask the 
system for help, and the system will display and explain 
the available options. 

- The experienced user should not be encumbered by the 
facilities which are present to aid the novice. 

- When editing and formatting a document, the user will 
be able to express what he wishes to do in a natural way, 
so he may easily translate his thoughts into system 
commands. 

- The system will contain a large amount of typographic 
knowledge so that usual formatting can be handled 
automatically by the system, with some assistance from 
the user. 

The latter three goals emphasize that the primary design 
goals are ease of learning and ease of use. While in many 
systems these two goals may appear to be at odds with each 


34 



IT 

in on 
lion 

ament 
d on a 
such 
it 

rated 
.r to 


ations 


i 


er 


gn 
any 
i each 


other. Etude provides optional features that make the 
system easy for the novice to learn without frustrating the 
expert. For example, while menus may be provided at any 
point by pressing the menu key. the use of a menu is never 
required for any operation — alternatives may be specified 
directly by the experienced operator. Expressing Lhe 
commands in a natural way and incorporating a large 
amount of knowledge into the system results in the use of 
higher level formatting commands than those found in 
most formatting systems. Letters can be written by 
specifying a return address, address, salutation, body, 
closing, and notations. These high level objects are referred 
to as the components of a document and vary over different 
document types. Type face, margins, spacing, and inden- 
tation are till handled automatically by the system, relieving 
the user from the burden of creating detailed specifications. 
These benefits apply both to novices and experienced users. 

Another major consideration in the design of Etude is the 
so-called "anxiety factor” [11]: 

Frequently, a long period of acclimatisation must elapse 
before an operator is sufficiently expert with a system to feel 
truly comfortable with it. In the interim, the user’s feelings 
are akin to those associated with walking a tightrope while 
wearing a blindfold. Because of the often obscure nature of 
the interface that he is forced to employ, the operator cannot 
fully anticipate the consequences of the actions lie performs. 

This leads to feelings of tension and uncertainty. Moreover, 
the user develops a fear of committing an unrecoverable 
error, and thereby becomes overly timid and cautious in his 
dealings with the system. 

Etude attempts to alleviate this anxiety in several ways, 
most notably by providing an undo key which will reverse 
the effects of any previous operation or sequence of 
operations. Also provided are a cancel key, which stops the 
current operation, and a help key that can be used at any 
time. 

The design of Etude began in 1979, after a large number of 
text editing and formatting systems were surveyed. Fea- 
tures from these systems which appeared useful in meeting 
our goals were incorporated into Etude [14]. Three 
important examples are: 

- The command structure, based on Doc [34]. 

- The idea of using high level formatting objects, based 
on Scribe [35], 

- An internal structure based on the boxes and glue of 
TeX [19]. 

After a prototype version was completed early in 1980, 
Etude was thoroughly reviewed in terms of its compliance 
to Lhe user interface guidelines found in the folklore. As a 
result, we changed several features and are reworking the 
system implementation for greater efficiency. I will now 
examine how Lhe current version of Etude measures up to 


the collected wisdom in the design folklore. Relevant 
features will be introduced as required. Hammer et al. [11] 
have described the system in more detail. 

3. Command Structure 

Etude's basic command structure follows verb-modifier- 
objecl form. The most commonly used verbs, modifiers, 
and objects are placed on special keys. Verbs include 
commands such as go-to, erase, begin, copy, and change. 
Modifiers include next, previous, start-of, end-of, and 
positive integers. Objects include "low-level” ones such as 
character, sentence, line, and paragraph, typefaces such as 
hold and italic, and the document components described in 
the previous section. Commands are formed through 
combinations such as erase sentence, go-to end-of next 
paragraph, begin address, and change previous 3 line(s) (to) 
quotation. Characters shown in parenLheses are provided 
by Elude when echoing the command in a special 
command line. This prompting is similar to the use of 
"noise words” in Tenex [4], 

This choice of command structure is becoming more 
popular in computer systems, especially in the area of word 
processing. The premise behind this approach is that it is 
easier for a user to perform complex functions by com- 
bining relatively small sets of primitive objects into 
commands than by having to remember and choose from a 
much larger set of separate commands. These primitives 
combine to form English phrases, with system prompts 
providing the necessary function words (e.g., prepositions 
and articles [39]). This is intended to present the user with 
an easy to use natural syntax while avoiding the “user- 
fooling” problems associated with “natural language” 
systems [33]. As Bennett [2] comments: 

Following this line of reasoning, we observe that the choice of 
a verb-like vocabulary to represent commands and a noun- 
like vocabulary for labels docs have proven value both for 
establishing an easy-to-tcach conceptual framework and for 
aiding memory during use. This highly desirable practice, 
drawing on a user's language habits, can be productive even 
though it slops far short of the full natural language 
exchanges envisioned by some. 

An experiment performed at Lhe University of Massa- 
chusetts at Amherst provides supporting evidence for this 
strategy [21]: 

The results demonstrate that redesigning the surface syntax of 
a commercial editor so that the commands more closely 
resemble English phrases resulted in far better performance. 

On all measures, performance using the English editor was 
superior to performance using the notalional editor. This was 
true regardless of the experience level of the users. 

There is no special “insert mode” in Etude. Text is inserted 
at the current cursor position (located between characters) 
as it is typed. This avoids the oft-encountered mode 


35 




switching problem, mentioned by Newman and Sproull 
[29]: 

Each such state in which a given operation by the user is 
interpreted differently by the program is called a command 
mode. The more modes in a command language, the more 
likely the user is to make mistakes by forgetting which mode 
he is in. Single-mode command languages avoid this problem 
completely, (p. 451) 

4. Abbreviations 

In Etude, the most frequently used verbs, modifiers, and 
objects are located on special keys. Less frequently used 
objects can either be selected from a menu or typed in 
during the command. When typing in an object name, only 
the number of letters that are needed to identify the object 
need to be typed. Etude will complete the object name in 
the command window whenever the name can be recog- 
nized. Often, the user need only type the first letter of the 
object name. This abbreviation scheme has several 
advantages [30]: 

The capability for the computer to perform recognition on a 
partially complete character string effectively combines the 
principles of concise computer-to-user messages, prompting, 
and efficient training procedures. 

It also follows a recommendation of Kennedy [17]: 

Communication should be carried out in a terse "natural” 
language, avoiding the use of codes and mnemonics. 
Abbreviations should be allowed wherever possible. 

This kind of an abbreviation facility is similar to one that 
Waltlier provided experimental subjects as part of a flexible 
interface [43], His experiment attempted to show the 
influence of a flexible interface on user satisfaction, taking 
into account the influence of user experience and terminal 
type (CRT or teletypewriter). In general, his results were 
inconclusive, as correlations usually depended on all lltree 
variables, not just flexibility. Results also differed de- 
pending on the criterion used to measure satisfaction 
(performance time, different types of user attitudes, or a 
measure of anxiety). However, the one clem result in his 
study was that users found Ihe more flexible system to be 
“friendlier,” independent of the other two variables: 

Users of a flexible interface will perceive their system to be 
more benevolent than will users of a rigid, unadaptable 
interface. If it is important that users have favorable altitudes 
to the extent that they regard the computer as being tolerant, 
flexible, like a person, friendly, and pleasant, the interface 
program should offer the user options like those used in this 
experiment (p. 138) 

The ability to use menus, full names, or abbreviated names 
is one way of meeting the goal that features for the novice 
not encumber the expert. It is akin to a scheme 
recommended by Shneiderman [38]: 


With careful design a system could satisfy a broad range of 
users. Novices would gel a set of menus. As the users gained 
experience, a fill-in-thc-blank approach could be employed, 
but if users forgot the choices, a blank entry would produce 
the menu. Finally, the most experienced users could make 
parametric command strings and request fill-in-the-blank or 
menu approaches when they had difficulty, (p. 241) 

More in line with this recommendation is the abbreviation 
mechanism. Make abbreviation can be used to reduce any 
combination of commands and text to a short abbreviation. 
Some of these may be stored on soft keys. Otherwise, 
pressing abbreviation signals that the following sequence of 
characters (terminated by go ahead) should be expanded. 
This follows a recommendation of Nickerson and Pew [30]: 

A means should be provided for the user to modify the 
language and redefine terms. I'or example, an individual who 
finds himself using a small set of commands very frequently 
might find it economical to replace each of these commands 
with a single-character abbreviation. Insofar as possible, he 
should be allowed to establish equivalences of this sort. 

5. Preventing and Correcting Mistakes 

The idea that an easy to use system should be forgiving is 
becoming a generally accepted principle in user interface 
design. It would seem to be self-evident that a system 
should allow a user to correct mistakes, but it is not clear 
what mechanism or combination of mechanisms is best for 
making error correction a simple task. In Etude, there are 
four mechanisms directly related to error prevention and 
correction: the use of confirmation, the ability to edit a 
command before it is completed, the cancel key, and the 
undo key. 

5.1. Confirmation 

Commands such as erase and copy require confirmation 
from the user before being executed. For example, as the 
user specifies the region to be erased, the region is 
displayed in reverse video; the user can then press go ahead 
to confirm that this region is to be erased, or press cancel if 
die action is mistaken. This follows a recommendation of 
Engel and Granda [8]: 

If a user is performing an operation on some item on a frame, 
highlight that item so that the user has feedback on which 
item is to be worked on, which may not be what he thinks he 
has specified. 

It also corresponds to an idea of Rohlfs’ [36]: 

There should be a non-final delete function; one that allows 
the reproduction of deleted items a parallel with pencil and 
rubber, which leaves a still readable faint mark of the old text. 

Confirmation is also used — and abused — in data entry 
interfaces. The same cautions that are recommended in 
dial area by Gaines and Facey [9] need to be considered in 
text processing applications as well; 


36 



ce 

e 

or 


ition 

e any 
ation. 

nee of 
Jed. 

V [30J: 

who 

tiy 

:ds 

nc 


!gis 

r ace 


ear 

st for 
' -re 


a 

the 


ion 

; the 

ahead 
ice! if 
>n of 

une, 

i 

; he 


ws 

id 

text 

itry 

in 

ed in 


Validation: Validate data on entry by checking syntax and 
values, but beware of rejecting data, or querying too much as 
being outside norms. 1 lave the user himself revalidate major 
updates before acting upon them. 

In editing, it has long been recognized that “dangerous” 
commands should be confirmed [8]: 

An extensive, final and permanent change to data should not 
be allowed without showing and/or indicating lo the user the 
results of his contemplated change. 

In Etude, confirmation is required for all commands which 
cause major changes in a document. Erasing small portions 
of text (such as a single line or sentence) does not require 
confirmation. 

While there are many methods of highlighting a region, 
using reverse video is attention-grabbing without being 
annoying (a fault associated with excessive use of blinking). 
Engel and Granda [8] comment further: 

Provide maximum contrast of a highlighted item with a non- 
highlighled item. This seems best done with text by reversing 
the image (dark on a light background, for example) of the 
item specified. 

5.2. Editing of Commands 

As commands are given to Etude, they are echoed in the 
command line. These commands may be edited by using a 
subset of the editing commands, such as the back-space and 
back-word keys. The use of hack-space is similar lo one of 
Wal ther's flexibility options [43] that were previously 
discussed in the section on abbreviations. Engel and 
Granda [8] advocate an extensive command editing facility: 

A user must be able to alter a line of input during entry and 
after entry. After back spacing with a non-destructive cursor 
and retyping, use of a forward tab or the "ENTER" key 
should signify to the system that the rest of the line should 
remain intact 

While the advantages in being able to correct errors that 
one can catch before they do any harm are evident, care 
should be taken that the command editing facility does not 
cause confusion with the command syntax. This is 
especially applicable to commands that require cursor 
movement 

5.3. Cancel 

Tlte cancel key is used to abort an operation that is 
currently being specified or is in progress. This contrasts 
with the undo key, described below, which is used to 
reverse the effects of an already completed operation. 

The design folklore contains several references to the idea 
of a “reset" key which is similar lo the cancel operation. 
These recommendations are based on the desirability of 
giving the user an easy to understand control over the 
system. Gaines and Facey [9] state this clearly: 


Observability and controllability: Envision the system as an 
automaton controlled by the user and make it simple to 
observe and control. In particular, provide a "reset’’ 
command which aborts the current activity cleanly. 

Gilb and Weinberg [10] elaborate: 

Provide an “escape to square one" button or simplest possible 
procedure that always works to untangle the worst possible 
procedural knot by simply undoing all that went on since the 
last "square one.” 

Hansen [12] also brings in an issue of keyboard design with 
his recommendation: 

One means of reducing the user’s interaction effort is to 
design the system so the user can operate it on ’muscle 
memory.’ .... A button should not have more than a few state 
dependent meanings and one button should be reserved to 
always return the system to some basic control slate. 

5.4. Undo 

The notion of an undo function is another idea that is 
gaining wide acceptance. Many different levels of function- 
ality have been implemented in various systems, ranging 
from the relatively simple capabilities in Bravo [20] to the 
far-ranging abilities in Interi.isp [41]. Etude provides an 
undo key that can reverse the result of the immediately 
preceding operation or sequence of operations. 

The undo operation is probably the single most important 
factor in relieving a user’s anxiety and alleviating the 
feeling of “walking a tightrope while wearing a blindfold” 
[ 10 ]: 

Protect the system against destruction or damage from wrong 
procedures. Without such protection, the operator is working 
under incredible pressure, which will only tend to increase 
mistakes. 

User training for Etude emphasizes file presence of the 
undo key and encourages experimentation. When demon- 
strating the prototype system, we are often asked “What 
if...” questions: the undo key gives us the ability to say 
“Well, let's try it and see what happens; we can always 
undo it if it isn’t right.” A system that encourages 
experimentation is one major step closer to being more 
natural to use. According to Jones [16], this step is of 
greater importance than that taken by attempting “natural 
language” interfaces due lo the basic nature of the 
experimental strategy: 

Just as mathematics is composed of both algorithm and 
notation, so communication is as much a matter of procedure 
as it is of language. The search for a near-English vocabulary 
and syntax will fail to bridge the gap. Instead, we should be 
seeking to base our dialogues on the procedures and strategies 
which humans adopt in communicating with each other. 

These procedures are based on the four principles of 
expectation, implication, experimentation and motivation. 


37 





Any system dealing with tasks as complex as the interactive 
editing and formatting of documents is going to find the 
ability to experiment to be a useful capability, no matter 
how simple the user interface appears to be. And 
regardless of the complexity or simplicity of the task, 
people are going to make mistakes [3]: 

Whatever the task, users can be counted on to make mistakes. 
Ibis is especially true for people who are infrequent users. A 
very important inhibitor to increased use of computer power 
is a fear, real or imagined, that work will have to be repeated. 

If a user is hurl for this reason, he is not likely to forget or 
forgive. 

There are several questions involved in the implementation 
of an undo mechanism for which we have neither experi- 
mental data nor folklore to guide us. How far back should 
a user be allowed to undo something? Restricting undo to 
the immediately preceding operation seems too strict, but 
an unlimited undo also poses problems. Besides compli- 
cating Lhe metuis of specifying which operations should be 
undone, does it make sense to undo an isolated operation 
performed many minutes ago when the current context 
may be completely different? 

Etude resolves these problems by allowing a user to undo 
an arbitrary length sequence of the immediately preceding 
operations. The user interface presents a list of the 
preceding operations as a menu and asks the user to specify 
how far back the undo should go. Since undo performs the 
inverse of the original operation and any operation can be 
undone, the user can subsequently undo part or all of the 
effects of a given undo command. This mechanism 
completely satisfies the goals given by Davies and Yates [7]: 

When the user makes an error he wants to backup to the last 
correct point in his interaction. This may nol always be 
possible but the system should give as much help as it can. 

6. On-line Assistance 

Another function that is becoming commonly accepted as 
desirable for nearly any system is that of on-line assistance, 
typically offered by way of a help function. This usually 
allows the user to refer to information about the system 
while he is on-line, rather than forcing the user to refer to 
manuals or other forms of off-line documentation which 
would interfere with the conversational nature of the 
interactive system. Exactly how much information is 
provided and how it can be selected is a matter that differs 
considerably from system to system. Related to a help 
facility is a menu facility, which lists the available options at 
the particular time. Elude provides both help and menu 
keys. 

On-line assistance can serve several functions. It can be 
used to provide a tutorial for the new user, as mentioned by 
Kennedy [17]: 


The computer system should give help when requested or 
whenever it perceives dial the user is in difficulty. Self- 
teaching is recommended. 

More typically, on-line assistance can help the user who has 
lost his place in an operation or who forgets what he should 
do next. This can be of use to all categories of users, 
diotigh as Martin [25] suggests, such a facility is most 
typically associated with novices and casual users: 

A red key on the terminal, labeled P," for example, 
could be of areal value. Most operators, especially new ones 
whom we want to win over to this way oflife. arc going to be 
bewildered occasionally at the terminal and will not know 
what to do next. Pressing the I IEI ,P key will cause the 
computer to give them the facts they need. A HELP 
conversation might begin with a "menu" screen asking what 
help is needed .... After being "helped." the operator must 
have an easy way of returning (o the main dialogue. 'Ibis will 
be true after other types of interruptions, and can be 
accomplished with a CONTINUE key. Pressing this key 
would return the system to its point of interruption, (p. 131) 

The importance of this type of assistance for casual users, 
especially discretionary users such as managers, is described 
by Bennett [3]: 

7he wide variety of functions available through a computer 
terminal makes it unlikely that a user can sit down and begin 
immediately to gel useful results without some review. The 
amount of review needed is especially important for discre- 
tionary users who are irregular in their time spent at the 
terminal. 

Besides explaining what can be done, on-line assistance can 
tell a user what he has already done. Etude’s help key will 
inform the user of the last few operations that have been 
performed as well as explaining the available options. 
Information about earlier operations can be retrieved using 
the query-in-depth method described below. Hayes, Ball 
and Reddy [13] recommend this type of facility: 

A gracefully interacting system should be able to give 
explanations of both a sialic and dynamic nature to its user. 

Static explanations relate to what the system can and cannot 
do in a general sense, and how the user can ask the system to 
do a specific task. Dynamic explanations describe what the 
system is doing and why it is doing it, and the outcomes of 
past events. 

Both novices and experienced users will forget details of 
system usage or be unaware of all die possibilities within a 
large number of choices. In Etude, this can especially occur 
when selecting the class of a component. In this case, 
menus can be the easiest way to provide assistance, as 
Hansen observes [12]: 

Because the user forgets, the computer memory must 
augment his memory. One important way this can be 
accomplished is by observing the principle sli.p.ction not 
entry. Rather than type a character string or operation 
name, the user should select the appropriate item from a list 
displayed by the computer. 


38 


o has 
lould 


es 

be 


it 

vill 

1 ) 

:rs, 

ribed 

r 

in 


e can 

will 

cn 

-ising 

iall 


>t 

to 


of 
in a 
occur 


Hansen appears to believe that menus should always be 
used: note the use of "selection not entry" as opposed to 
"selection or entry." However, unless the menus are as easy 
and quick to use as direct entry, their forced use may 
encumber experienced operators. The required use of 
menus also violates the Hansen’s own principle of display 
inertia: 

As the system reacts to a user's request, it should observe the 
principle of display inertia. 'Ihis means the display should 
change as little as necessary to carry out the request 

If help and menu facilities are adopted, how can they best 
be implemented? One criteria should be the suitability of 
the mechanism to a broad range of users. A user might 
require anything front the briefest reminder of what can be 
done to the most detailed explanation of a given situation. 
One way to achieve this goal is to use the query-in-depth 
method proposed by Gaines and Facey [9]: 

Query-in-depth: Distribute information and tutorial material 
appropriately throughout the system to be accessed by the 
user through a simple uniform mechanism. 

An example of using query-in-depth is a help facility that is 
invoked by a question mark. The tirsl time a question mark 
is typed, a short piece of information is provided. Repeti- 
tion of the question mark results in successively more 
detailed explanations, until the most detailed level (perhaps 
advising where to gel human assistance) is reached. 

To facilitate query-in-depth. Etude treats help and menu in 
much the same way: information provided in response to 
help is structured like a menu, but with more verbose 
description included. In either case, the user can then move 
around the menu to select an item. Pressing help will 
provide more detailed information about this particular 
item. 

Some other guidelines for menu implementation may also 
be useful. For example, menus in many of today’s systems 
make the user specify items in the menu by number. This 
is not necessarily the easiest way to do things, especially if 
the system is designed so that menus are strictly optional. 

In this case, the following recommendation by Hansen [12] 
is especially applicable: 

The second principle to avoid memorization is names not 
numbers. When the user is to select from a set of items he 
should be able to select among them by name. 

In Etude, for example, the user can type in the name of an 
item whether or not he has seen a menu. If he uses a menu, 
then he may also choose an item by using cursor movement 
commands to move within the menu (the currently selected 
item in the menu is always displayed in reverse video; when 
the menu first comes up. the first item is chosen by default 
as the “currently selected item”). 


Only currently permissible items should be displayed in 
menus; items that might be relevant at other times in 
similar situations should not be included. Such information 
generally does not help the user, and in fact may degrade 
his performance since the extra information increases the 
length of the menu and consequently increases the time the 
user needs to find the correct item [1]. One application of 
this principle in the Etude prototype occurs when a user has 
typed the first few characters of a name and then asks for a 
menu; only the items starting with the characters already 
typed are displayed. 

Menus that obscure the portion of text that the user was 
working on violate the principle of display inertia and may 
cause the user to fear that his work has been lost. In Etude, 
menus are positioned so as never to obscure the text 
surrounding the current cursor position. 

7. Feedback and Error Messages 

During Etude's implementation, constant attention was 
paid to providing immediate feedback to the user. Every 
keystroke that the user types causes some response on the 
screen. The following Gaines and Facey principle [9] was 
strictly adhered to: 

Immediate feedback: Give die user feedback by making an 
immediate unambiguous response to each of his inputs. This 
should be sufficient to identify the type of activity taking 
place. 

When commands are completed, a form of closure has been 
reached [27], Sometimes, in complex commands, the 
results of the command cannot be displayed immediately. 
However, any command echoed in the command line 
(which includes all actions except for text insertion and the 
use of "arrow" keys | , — +, |, and <— for cursor movement) 
displays an "[ok]” before the command is executed to 
acknowledge that the user has finished specifying the 
command. This is in accordance with a guideline suggested 
by Rouse [37]: 

The computer should respond quickly or, if that is not 
possible, the user should be provided with some feedback that 
his program is operating. 

Etude follows many of Rohlfs’ [36] guidelines for the design 
of error messages and their presentation to the user, such as 
always putting error messages in the same place on the 
screen and avoiding numeric codes, unknown abbrevi- 
ations, and computer terminology. Messages and prompts 
are kept straight and to the point. They are never cute and 
folksy, a fault which Engel and Granda warn of [8]: 

No attempts at humor or punishment should be made. 

People are still threatened by an anthropomorphic machine, 
and until the optimal ‘personality’ of a computer can be 
derived, keep dialog strictly factual and informative. 


39 


9. Hardware Considerations 



8. Response Time 

As the reader may have noticed, efficiency in terms of 
speed and space requirements was not high on the list of 
Etude's design goals. In building the prototype, we were 
much more concerned with getting the system to function 
correctly than with getting it to function efficiently. 

I lowever. response time is involved in several human 
factors considerations fora user interface. The relationship 
is not simply "faster is better." but involves considerations 
of thresholds of acceptability, response time variability, and 
pacing of the user. The prototype version did not meet 
requirements in this area. Major changes in the new 
implementation should alleviate this problem. 

No interactive system is easy to use if it is loo slow to keep 
up its end of the interaction. Robert Miller's oft-cited 
paper on response time requirements [27] proposes a 2 
second figure as a general rule for many types of 
interaction. Much faster responses are needed for things 
such as echoing typed input (instantaneous, < .1 second), 
and slower responses are acceptable, if not desirable, at 
points where a major closure has been achieved. However, 
Miller mentions that there is a clear break in acceptability 
at the 15 second point: 

In any event, response delays of approximately 15 seconds, 
and certainly am delays longer than this, rule out 
convcrsaiional interaction between human and information 
systems. 

Miller and several other authors hypothesized that vari- 
ability of response lime was perhaps of more importance 
than absolute response lime, assuming that the latter was 
within tolerable limits. Experimental validation of this 
hypothesis was later provided by Lawrence Miller [26]: 

The effects of varying CRT display rales and output delays 
upon user performance and attitudes in a scries of message 
retrieval tasks were evaluated experimentally. The results 
support the somewhat surprising conclusion that doubling the 
display rate from 1200 to 2400 baud produces no significant 
performance or attitude changes; increasing the variability or 
the output display rate produces both significantly decreased 
user performance and a poorer attitude towards system and 
interactive environment. 

It has also been theorized that response times that are too 
fast push the user too hard and induce additional stress in 
trying to keep up with the machine. Others mention that 
too-rapid interruptions with error messages are both 
impolite and unproductive. Among those concerned with 
excessively fast response times is Kennedy [17]: 

The rate of exchange must be within the user’s stress-free 
working range. Control of the rate should always appear to 
belong to the user. 


While most of this paper has discussed matters that 
primarily concern the software design of the system, it is 
important to consider factors regarding the display, the 
workstation, keyboard layout, and provisions for alternative 
input and control devices. Cakir, Hart and Stewart’s 
comprehensive report on visual display terminals and their 
workstations [5] includes extensive design checklists. 

The selection and design of input and control devices for 
Etude is a lively discussion topic within the design group. 
Since we are just shirting to consider issues of keyboard 
design in detail, specific implementations of recommen- 
dations cannot be listed. However, one simple change to 
the operation of the standard keyboard that has been 
incorporated into Elude and many commercial word 
processors can save the user time and effort [18]: 

A system which monitors keystrokes and automatically inserts 
RETURN codes when the space bar is hit in the "hot zone” at 
the end of the line will save a minimum of 250 millisec./line, 
or 2.1% with almost no operator retraining. Any time spent 
deciding whether or not to hit REHJRN is also saved, and 
SPACE is frequently struck before RETURN in any case. 

There is a danger in relying loo heavily on a special 
keyboard due to the constantly changing nature of most 
computer systems. Martin [25] warns of this problem: 

It is often difficult for the systems analyst to anticipate what 
change will occur or how rapidly it will come. If he designs 
his dialogue around a custom-built keyboard, he is building 
resistance to change into the system, (p. 143) 

Etude tries to circumvent some of these restrictions by 
emphasizing that only the most common special functions 
are found on the keyboard, thus eliminating the need for 
ever-expanding keyboards, and by using soft keys. How- 
ever, the problem of identifying the most commonly used 
keys still remains. The problem is not as critical in Etude as 
in commercial systems since we have a large amount of 
liberty in redefining keys as design and development 
progresses. 

The question of using alternate input and control devices 
such as "mice,” light pens, tablets, and touch keyboards 
seems more an issue of theology than of practical guidelines 
at this stage. Card el al. [6] report an experiment where the 
mouse was shown to be preferable to two different types of 
keyboard mechanisms and one type of joystick, approach- 
ing theoretically optimum performance for a pointing 
device (with the exception of use with single character 
targets). However, one should note that only one type of 
joystick (rate-controlled isometric) was tested, and that the 
arrow keys on the tested system were very slow compared 
to those provided in Etude and most commercially avail- 
able CRT’s. Engel and Granda [8] generally recommend 
using joysticks. 


40 


t is 
le 

native 

their 

for 

'Hip. 

rd 
:n- 
; to 


sens 
o" at 
ne, 
nt 
i 


>5t 


y 

tions 

for 

ow- 

ised 

ide as 

if 


ices 

ds 

ielines 
re the 
res of 
ach- 


;of 
t the 
red 
hi- 
nd 


The principle advantages cited for keyboard-oriented ap- 
proaches such as those found in the Etude prototype are 1) 
the intended users of the system are more familiar with 
keyboards than with the alternative devices, 2) the user 
does not have to move one of his hands from die keyboard 
to position the cursor, and 3) the added problems with extra 
equipment are avoided. These problems include placement 
of llie extra equipment at the workstation and potential for 
breakage and theft. 

Etude’s approach to this problematical area is to provide for 
one or more pointing devices, without constraining the 
choice of specific devices. The arrow keys currently serve 
as a pointing device, but the system architecture allows for 
other devices such as a mouse or a joystick to be added at a 
later date. 

10. Miscellaneous 

Many recommendations to be found in the folklore of user 
interface design are not easily categorized. Some of these 
can be related directly to certain features of Etude. Others 
are more general but worthy of mention. 

10.1. Log Keeping 

The same mechanism which facilitates help and undo is also 
used to keep a system log, a facility recommended by 
Gaines and Facey [9]: 

l og activities: Use the computer to maintain records of 
system and user activities to evaluate the behavior of both 
system and users. 

Hansen [12] also emphasizes the utility of a log: 

It is not enough to simply tell the user of his errors. The 
system designer must also be told so he can apply the 
principle engineer ou r the common errors. If an error 
occurs frequently, it is not the fault of the user, it is a problem 
in the system design. 

10.2. Beyond Commands 

Thomas [42] has described the need to permit metacom- 
ments about the current dialogue in a systematic way within 
the computer system: 

People communicate not only about the topic under 
discussion, but also regarding direction of the conversation, 
speed of the conversation and internal state of the convcrsers 
.... In many computer systems, there are sporadic attempts to 
provide for some metacomments. Typically, these are added 
features. They include such items as a “temperature- 
humidity index" that tells the user how busy the system is, an 
on-line facility about what is available, a "comments” 
command, or the ability to change "topics” by always 
returning to an operating system. An alternative strategy to 
adding metacommcnt capabilities to computer systems piece- 
meal is to recognize from llie beginning of systems design that 
the types of mctacomments listed above are found in natural 


language communication for a reason, and that a computer 
sistem should also provide for these functions. 

Similarly. Mann [23] claims that current systems are hard to 
use because their vocabulary is limited nearly entirely to 
commands, whereas humans stale goals, give examples, 
describe, clarify, hypothesize, use analogies, and make 
comparisons in addition to using commands. Elevating 
command usage from a minor portion of dialogue, as it is in 
natural language, to the dominant portion makes the new 
type of dialogue unnatural for people to learn and use. 

Etude has some ad hoc metacomment facilities. Help, 
menu, and cancel either influence the direction of the 
dialogue and/or provide information about the state of one 
or both conversers. The confirmation facility provides a 
primitive mechanism for determining Lhe pace of the 
interaction. In addition, the suggestion for logging facilities 
given by Hansen [12] and the INTERRUPT key suggested 
by Marlin [25] are examples of metacomment facilities. 
Integrating these various functions as well as methods for 
customizing and controlling the pace of the dialogue into a 
general mechanism would be a challenging task, but one 
that could pay off handsomely in increased ease of use. 

10.3. Simplicity 

Shneiderman [38] presents a desiderata of simplicity, 
though he gives no means for quantifying his goals: 

An interactive system is simple if it has few commands and if 
die commands have a consistent structure. Output should be 
readable and error messages should have a uniform format 
Command structures should match the problem domain and 
the sequence of user thought processes, (p. 255) 

In the same vein, Gaines and Facey [9] issue a plea for 
consistency which echoes throughout the folklore: 

Consistency and uniformity: Ensure that all terminology and 
operational techniques arc consistently applied, and uni- 
formly available, throughout all system activities. 

Engel and Granda [8] give a specific example of what not to 
do: 

Nomenclature must be the same for similar or identical 
functions across components, tasks, and roles for command 
names, subcommand names, and parameters. For example, 
don’t use “edit” in one place, “modify" in another, and 
"update” in a third. 

Consistency and predictability are closely related issues, as 
Pew and Rollins [32] observe: 

A critical ingredient of friendliness is consistency. Once the 
user has learned a procedure or set of simple rules, he or she 
has die light to expect that they will always work in every 
context that is perceived to be similar to the context it which it 
was first encountered. 

By combining primitives to form more complex commands, 
Etude attempts to meet these goals of consistency and 


41 


simplicity. We believe that these gains will be especially 
noticeable when additional functions are integrated with 
the editor and formatter. 

10.4. Social Issues 

As a final guideline, it is most important to consider the 
impact of the system being designed on the people who use 
it. Margulies [24] reminds us that measures other than 
efficiency are also important when evaluating computer 
systems: 

You will have to change your value system, recognising that 
the quality of your computer system has to be evaluated not 
in terms of output and revenue only but also in terms of job 
satisfaction and quality of life. 

11. Future Work 

Throughout this paper I have mentioned that a designer 
trying to produce a good user interface has to rely more on 
folklore that on sound engineering principles. To avoid 
falling into the category of those who talk about the 
problem without doing anything about it, 1 will soon 
perform an experimental evaluation of the Elude system. 

In this paper I have shown that Etude adheres to a large 
amount of the folklore regarding user interface design. I 
have not shown that Etude is in fact an easy to use system, 
nor has the term "ease of use" been defined. In my 
experiment, secretaries without any computer experience 
will be recruited as subjects and taught to use Etude. The 
following criteria, based on the work of Miller [28], will be 
used to measure ease of use: 

- Ease of learning: The proportion of subjects who can 
learn to use the system in a given amount of lime. 

- Ease of use once the system is learned: The 
proportion of subjects who can complete a set of tasks 
in a given amount of time. 

- The subject’s state anxiety [40] when using the system. 

- The altitude of the subject toward the system, mea- 
sured by a Semantic Differential [22, 31]. 

If Etude is indeed easy to use, there will be some further 
experimental validation for the guidelines contained in the 
folklore. If it is not easy to use, there should be enough 
information present for the reader to decide whether the 
fault rests with some of the guidelines or with their 
implementation in Etude. This study should add to the 
slowly growing body of literature that is making research in 
user interface design less like the weather and more like 
sound engineering. 


References 

1. Baker. James D. and Ira Goldstein. Batch vs. Sequential 
Displays: Effects on Human Problem Solving. Human 
Factors 8 ( 1966), 225-235. 

2. Bennett. John L. The User Interface in Interactive 
Systems. I n Annual Review of Information Science and 
Technology. Vol. 7. C. A. Cuadra. Ed.. American Society for 
Information Science. Washington. 1972, pp. 159-196. 

3. Bennett. John L. The Commercial Impact of Usability 
in Interactive Systems. In Man/Computer Communication, 
Vol. 2. Infotech State of the Art Report, Maidenhead, 
England, 1979, pp. 1-17. 

4. Bobrow, Daniel G. et al. TENEX. a Paged Time 
Sharing System for the PDP-10. Comm. ACM 15 (1972), 
135-143. 

5. Cakir. A.. D. J. Hart and T. F. M. Stewart. Visual 
Display Terminals. John Wiley & Sons, New York, 1980. 
Originally published in 1979 by The Inca-Fiej Research 
Association, Darmstadt. 

6. Card, Stuart K., William K. English and Betty J. Burr. 
Evaluation of Mouse, Rate-Controlled Isometric Joystick, 
Step Keys, and Text Keys for Text Selection on a CRT. 
Ergonomics 21 (1978), 601-613. 

7. Davies, Donald W. and David M. Yales. Human 
Factors in Display Terminal Procedures. Proc. Fourth 
International Conference on Computer Communication, 
International Council for Computer Communication, 

Sept., 1978, pp. 777-783. 

8. Engel, Stephen E. and Richard E. Granda. Guidelines 
for Man/Display Interfaces. Tech. Rep. TR 00.2720, IBM 
Poughkeepsie Laboratory, December 19, 1975. 

9. Gaines, Brian R. and Peter V. Facey. Some Experience 
in Interactive System Development and Application. Proc. 
IEEE 63 (1975), 894-911. 

10. Gilb, Tom and Gerald M. Weinberg. Humanized 
Input. Winthrop, Cambridge, Mass., 1977. 

11. Hammer, Michael et al. Etude: An Integrated Docu- 
ment Processing System. Proceedings of the 1981 Office 
Automation Conference, AFIPS, March, 1981. 

12. Hansen, Wilfred J. User engineering principles for 
interactive systems. AFIPS Conference Proceedings, Vol. 

39, AFIPS Press, Montvale, N. J., 1971, pp. 523-532. 

13. Hayes, Phil, Eugene Ball, and Raj Reddy. Breaking the 
Man-Machine Communication Barrier. Computer 14 
(March 1981), 19-30. 

14. llson, Richard. An Integrated Approach to Formatted 
Document Production. Master Th., MIT, Aug., 1980. 


42 


...ential 

nan 

e 

.7 id 

iety for 


i, 


72 ), 


> 80 . 

ch 


urr. 

tick, 

T. 


i 

nn. 


ines 

IBM 

once 

Proc. 


cu 

ce 


'ol. 
g the 


.ted 


15. llson. Rich;ird and Michael Good. Etude: An Inter- 
active Editor and Eormalter. Memo OAM-029. MIT Lab. 
for Computer Science. Office Automation Croup, March, 

1981. 

16 . Jones. P. F. Four principles of man-computer dia- 
logue. Computer Aided Design 10 ( 1978), 197-202. 

17. Kennedy. T. C. S. The Design of Interactive Proce- 
dures for Man-Machine Communication. Internal. J. Man- 
Machine Studies 6 ( 1974), 309-334. 

18 . Kinkead. Robin. Typing Speed. Keying Rales, and 
Optimal Keyboard Layouts. Proceedings of the 19th 
Annual Meeting. The Human Factors Society, Oct., 1975, 
pp. 159-161. 

19 . Knuth, Donald E. TEX and METAFONT: New Direc- 
tions in Typesetting. American Mathematical Society and 
Digital Press, 1979. 

20. l.ampson. Butler W. Bravo Manual. In Alto User's 
Handbook, Xerox PARC, 1979. 

21. Ledgard. Henry et al. The Natural Language of 
Interactive Systems. Comm. ACM 23 (Oct. 1980). 556-563. 

22. Lucas. R. W. A study of patients’ attitudes to computer 
interrogation. Internal. J. Man-Machine Studies 9 (1977), 

69-86. 

23. Mann. William C. Why Tilings Are So Bad for the 
Computer-Naive User. Tech. Rep. ISI/RR-75-32, 
USC/Information Sciences Institute, March, 1975. 

24. Margulies, F. Technological Change: Its Impact on 
Man and Society. In Man/ Computer Communication, Vol. 

2, Infotech State of the Art Report, Maidenhead. England, 

1979, pp. 251-261. 

25. Martin, James. Design of Man-Computer Dialogues. 
Prentice-Hall, Englewood Cliffs, N.J., 1973. 

26. Miller. Lawrence H. A study in man-machine inter- 
action. AFIPS Conference Proceedings, Vol. 46, AFIPS 
Press, Monlvale, N.J., May, 1977, pp. 409-421. 

27. Miller, Robert B. Response lime in man-computer 
conversational transactions. AFIPS Conference Proceed- 
ings, Vol. 33, Part 1, Thompson Book Co., Washington, 

1968, pp. 267-277. 

23. Miller, Robert B. Human Ease of Use Criteria and 
Their Tradeoffs. Tech. Rep. TR 00.2185, IBM Pough- 
keepsie Laboratory, April 12, 1971. 

29. Newman, William M. and Robert F. Sproull. 

Principles of Interactive Computer Graphics. McGraw-Hill, 

New York, 1979. Second Edition. 

30. Nickerson, Raymond S. and Richard W. Pew. Oblique 
Steps Toward the Human-Factors Engineering of Inter- 
active Computer Systems. Appendix to BBN Report No. 

43 


2190 by Mario C. Grignelli et al.. Information Processing 
Models and Computer Aids for Human Performance, June 
30. 1971. NTIS No. AD 732 913. 

31. Osgood. Charles E.. George J. Suci. and Percy 

H. Tannenbaum. The Measurement of Meaning. University 
of Illinois Press, 1957. 

32. Pew. Richard W. and Ann M. Rollins. Dialog Specifi- 
cation Procedures. Report 3129, Bolt Beranek and New- 
man. Sept.. 1975. Revised ed. NTIS No. PB-252 976. 

33. Plum. Thomas. Fooling the User of a Programming 
Language. Software — Practice and Experience 7 (1977), 
215-22. 

34. Pratt. V. R. DOC Manual. MIT, 1979. 

35. Reid, Brian K. A High-Level Approach to Computer 
Document Formatting. Conference Record of die Seventh 
Annual ACM Symposium on Principles of Programming 
Languages, ACM, Jan., 1980, pp. 24-31. 

36. Rohlfs. S. User Interface Requirements. In 
Convergence, Vol. 2, Infotech Slate of die Art Report, 
Maidenhead, England, 1979, pp. 165-199. 

37. Rouse, William B. Design of Man-Computer Inter- 
faces for On-Line Interactive Systems. Proc. IEEE 63 
(1975), 847-857. 

38. Shneiderman, Ben. Software Psychology. Winlhrop, 
Cambridge, Mass., 1980. 

39. Slobin, Dan Isaac. Psycholinguistics. Scott, Foresman 
and Co., Glenview, HI., 1979. Second edition. 

40. Spielberger, Charles D. Anxiety as an Emotional State. 
In Anxiety : Current Trends in Theory and Research, Vol. I, 
C. D. Spielberger, Ed., Academic Press, New York, 1972, 
pp. 23-49. 

41. Teitelman, Warren. Interlisp Reference Manual 
Second revision edition, Xerox PARC, December 1975. 

42. Thomas, John C.. Jr. A design-interpretation analysis 
of natural English widi applications to man-computer 
interaction. Internal. J. Man-Machine Studies 10 (1978), 
651-668. 

43. Walther, George H. The On-Line User-Computer 
Interface: The Effects of Interface Flexibility, Experience, 
and Terminal-Type on User-Satisfaction and Performance. 
Ph.D. Th., University of Texas at Austin, Aug., 1973. NTIS 
No. AD-777 314. 




The Document Editor: 

A Support Environment for Preparing Technical Documents 


Janet H. Walker 
Bolt Beranek and Newman Inc. 

10 Moulton Street, Cambridge MA 02238 


1 . Introduction 

As understanding of a particular problem domain 
matures, its tools become more specialized, moving from 
general low-level tools to more specialized high-level 
tools. Early work in a field applies the same set of low- 
level aids to all problems. Gradually more specialized 
tools emerge as we come to better understand the problem 
and the tools that are best for it. 

Since the emergence of the first program editors in the 
early 1960s we have learned a lot about the editing process 
and about building editors. Recently several editors 
specialized for editing program sources, text, and data 
structures have been developed. To date, little work has 
been reported on specialized document editors for editing 
complex text. This paper describes a research effort into 
identifying the requirements for an interactive 
environment for editing complex documents and an initial 
implementation for the environment. 

1.1. Structure editing 

LISP editors and other structure editors for 
programming languages [9] manipulate structures rather 
than text. In these editors, some expert builds an editor 
that "knows" the structural characteristics of the language 
and aids users of the editor perhaps, for example, by 
preventing them from entering any text that would not be 
syntactically acceptable in the language. This approach 
mns into several problems in the documentation world, 
not the least of which is the lack of any syntactic 
description of what constitutes a valid document. One of 
the goals of this work is to develop a structural description 
for documents that is distinct from any particular 
commands in the document source. 

Permission to copy without fee all or 
part of this material is granted 
provided that the copies are not made or 
distributed for direct commercial 
advantage, the ACM copyright notice and 
the title of the publication and its 
date appear, and notice is given that 
copying is by permission of the 
Association for Computing Machinery. To 
copy otherwise, or to republish, 
requires a fee arid/or specific 
permission. 

0 1981 ACM 0-89791-043-5/31/0600/0044 $00.75 


1 .2. Technical writing environments 

Writers of technical documents require facilities for 
assembling, creating, modifying, formatting, and 
maintaining their documents. The tools conventionally 
applied to these tasks are a standard text editor, some form 
of document formatter, and a message system, all 
operating under some sort of operating system and its 
command interpreter. The various tools operate on files 
consisting of text strings. An ideal document preparation 
system integrates the functions of these tools into a unified 
environment for managing technical text. 

The present work has several allied goals: 

• Identify the processing requirements of complex 
technical text and the tools to support these 
requirements. 

• Design an interactive working environment for 
technical writers incorporating those tools. 

• Build a working system and gain enough experience 
to evaluate the selection of tools and the design. 

The design work has proceeded at several levels: the 
identification of a dream system (what would be 
ultimately desirable), the design of a realistic target 
system, and the design of a preliminary implementation. 
This paper characterizes the target system in fairly abstract 
terms, and describes in more concrete detail the system 
that is being implemented. 


2. An ideal system 

The computer system design process is often three- 
tiered. First, an ideal system is proposed, whose design 
ignores all implementation considerations. Second, a 
realistic design is achieved by restricting the goals of the 
ideal system because of limitations of the computer 
environment or considerations of efficiency or manpower. 
Finally an implementation is achieved, which is an 
approximation to the restricted design. This section 
outlines my pipe-dream system, designed entirely without 
regard to whether it is possible to implement. Following 
sections describe the restricted design and its 
implementation. 


es for 
and 
dually 
e form 
n, all 
i nd its 
ill files 
iratioa 
inified 


mplex 

these 

for 

,-rience 

n. 

Is: the 
ild be 


three- 
design 
and, a 
of the 
nputer 
power, 
is an 
section 
without 
lowing 
d its 


Ideally, the Document Editor manipulates a data 
structure that is a canonical representation of a structured 
document. The actual representation of the document — 
some kind of text— is parsed into this internal form as an 
“import” operation when the document is loaded into the 
document editor. 

Upon exit, the editor unparses its internal form back 
into the appropriate text file in the document preparation 
language. The structure of each document is maintained 
independently of the language in which that structure is 
represented, and the structure-editing operations work not 
on the Lextual representation but on the structure itself. In 
this regard, the Document Editor is similar to the NLS 
Augment system, as opposed to a pure structure editor like 
GANDALF [12, 3]. 

Adding a new document language to this editor’s 
repertoire consists primarily of writing the parser and 
unparscr for that language. The difficulty of that task 
depends on the nature of the language. 

The users can choose to view the editoras either a text 
editor or a structure editor. Text editing commands 
operate as if the internal representation were text and their 
effect is mapped back into the structure on completion. 
Structure editing commands operate directly on the 
internal representation. The text editing commands 
manipulate normal text; the user deals in characters, 
words, lines, sentences, and paragraphs. The structure 
editing commands manipulate only structural elements of 
the text; the user issues requests to promote, demote, 
rearrange, or process a structural element. 

Some of the commands are local, fine-grained 
commands that might add a letter, replace a word, or 
delete a line. Other commands are global, large scale 
commands that print a chapter, find all references to a 
cross-referenced label, or generate an index. From the 
user’s point of view, all of the power is part of the text 
editor and the implementation of structure is transparent. 

Associated with each document is a structural 
description that identifies the names and relationships of 
the hierarchical levels. In some cases. Scribe documents 
for example, the structural description is expressed by the 
formatter’s database and can be extracted from it 
automatically. In other cases, the user must associate an 
explicit structural description with each document 

Since one Of the major overall goals of this research 
project was to identify useful primitives in a document 
editor, it seemed injudicious to implement an editor from 
scratch before having a strong idea of what it was 
supposed to do. I therefore adopted a more modest goal, 
namely an editor that can manipulate documents 
represented in a single document language (Scribe) that 
could be implemented on top of an existing text editor 
(EMACS). The remainder of this paper describes that 
design and implementation. 


3. System Design 

The Document Editor is conceptually a multi-process 
executive whose top level is a text editor. This text editor 
is the command interface for the rest of the system, and 
incidentally also edits text. Each of the commands 
invoked by the user is dispatched by this top-level text 
editor: some of them are in fact directly executed by the 
text editor, while others are passed on to other programs 
that run as subsidiary processes to the text editor. The text 
editor integrates Lhe set of document editing tools into a 
unified environment, and provides the command interface 
and help facility for them. 

The Document Editor is built within EMACS [10], an 
extensible text editor that integrates well with the multi- 
process executive of the Tops-20 operating system [2]. It 
edits documents represented in the Scribe document 
specification language [6], and invokes the Scribe 
compiler [7] to format them. It runs the Hermes [8] 
message system, providing a mechanism for exchanging 
pieces of documents with other users. 


4. The Tools 

The implementation of the Document Editor consists 
of the top-level executive and a set of tools that it invokes. 
The tools fall into three major functional groups: writing 
aids, structure editing aids, and document management 
aids. This section describes these groups of tools. 

4.1 .Writing Aids 

A writing aid is any tool that helps with the strictly 
English language aspects of capturing one’s thoughts in a 
computer text file. Writing aids currently come in several 
styles, "batch", "interactive batch", and "immediate". 

The "batch” tools are utilities like Spell and the 
Writers Workbench [4] from Bell Laboratories. These are 
utility programs that are separate from the normal text 
editors. They are "batch" in that they run over a complete 
file. Some, like some versions of Spell [5, 11], are 
interactive, offering alternate spellings and allowing users 
to accept, reject, or replace the alternative. 

What I call an "interactive batch" aid is interactive 
because it is part of the text editor environment where you 
direct the course of its decisions during processing, and 
batch because you run it on a whole document or file at 
once. This class of aids is useful primarily for correcting 
problems that already exist rather than preventing 
problems from occurring. Some of the writing aids in the 
Document Editor belong to this category. For example, 
one function finds every "suspicious" occurrence of the 
word "which” and offers to change it to "that". (For 
example, "functions which" is a suspicious occurrence; "in 
which" is not.) Another finds each suspicious occurrence 
of "may” and offers to change it to "can", "might", or "is 
permitted to”. These tools are more useful during editing 
(particularly editing of someone else’s writing!) than 
during initial creation of text. 


45 



An "immediate" aid is one lhal operates while you are 
writing. These are difficult to integrate into present editor 
designs. The Document Editor uses the "abbreviation 
expander" in EMACS. This function examines each word 
as it is entered and checks for its presence in a persona] 
dictionary of "abbreviations". Eor example, in my world 
it expands "chares" into "characteristics". In addition, it 
transforms the string "probelm" into the word "problem" 
as I type it. This makes it into a powerful aid, a spelling 
corrector for frequent errors, instead of simply a tool for 
reducing keystrokes. 

Much more fascinating work remains to be done in the 
area of stylistic writing aids, particularly in the area of 
immediate aids. The application area ranges from simple 
surface structure computation ("with 19 words, this 
sentence is longer than average") through more complex 
surface structure ("that's the fifth periodic sentence in a 
row ) through full natural language analysis ("found 
possibly ambiguous referent for ‘it’"). The area of style is 
one for which few analogies yet exist in programmers’ 
environments [13]. 

4.2. Structure Editing Aids 

The Document Editor contains a selection of tools for 
manipulating the structural elements of a document. The 
structural elements are the regions of a document 
corresponding to its logical hierarchical components. For 
example, in a conventionally tree-structured document, a 
chapter is a structural element containing sections as 
nested structural elements. 

This section describes the classes of functions provided 
by the Document Editor for manipulating structural 
elements. 

4.2.1 . Locators 

Locator commands aid by locating places in a file that 
have certain structural properties. For example, when I 
wanted to move to this section to add this paragraph, I 
wanted the "Next structural element" command. One 
common application of this class of commands might be 
for moving through a manuscript validating or revising all 
the introductory paragraphs. 

Down structural level 

Move to the beginning of the next structural 
level nested in this one. For example, in a 
chapter in a conventional document, this 
command moves to the beginning of the first 
section in the chapter. 

Up structural level 

Move to the beginning of the nearest enclosing 
structural level. That is, in a section in a 
conventional document, move to the beginning 
of the chapter that contains the section. 

Next structural element 

Move to the beginning of the next structural 
element at the same level in the document. In 


a conventional document, if you are in a 
section, move to the beginning of the next 
section; if you are in a chapter, move to the 
beginning of the next chapter. 

Any structural element 

Move to the beginning of the next structural 
element in the file regardless of its level in the 
structural description. 

Find structural element 

Move to the beginning of the next structural 
element with the level specified. 

These commands are of course in addition to the 
normal, context dependent, structure independent means 
for locating filings of interest in the text. 

4.2.2. Constructors 

Constructor commands aid in adding structural 
elements of a specified type. 

Create empty structural element 

Create another element at the same level as the 
current one. For example, if you are in a 
section, it creates another section. This class of 
command also has commands for creating new 
elements of inner and outer types, for example, 
a chapter or a subsection. 

Create new structural element 

Create another element using a region 
currently selected by the editor’s select 
mechanisms. 

Make a list of... 

Create lists of various things in job 
environment, for example, a list of all the files 
involved in a given document, a list of all the 
sections that have changed during a certain 
time period. (Parallel development of these 
concepts appears in [1].) 

These commands support two distinct ways of treating 
new elements in a document. In one case, you declare the 
hierarchical sUitus of each thing as you create it. In the 
other, you classify existing text according to its 
organizational status. BoLh methods fill valid needs. It is 
fundamentally important for the user interfaces in 
document editors to provide both facilities. 

4.2.3. Mutators 

Mutator commands aid in revising structure by 
changing the abstract status of a particular portion of a 
manuscript. The advantage to this is that you think about 
the relationship between the objects rather than dealing 
with the formatting commands that produce the formatted 
version of the objects. 

Change structural level 

Change the status of the current structural 
element (Push. Drop, Raise, and Pop). For 
example, "Push" for a section makes the 
current section into a subsection and any of its 


46 


in a 
next 
c to the 


ruclural 
.1 in the 


ruclural 

to the 
t means 


uctural 


I as the 
■e in a 
class of 
ng new 
ample, 


region 

select 


job 
- tiles 
all the 
certain 
f these 

realing 
are the 
In the 
to its 
>. It is 
;es in 


re by 
n of a 
about 
Sealing 
natted 


ictural 
. For 
:s the 
of its 


current subsections into subsubsections. 
"Drop" for a section makes the current section 
into a subsection at the same level as its former 
subsections. 

4.2.4. Selectors 

Selector commands aid in visualizing the 
organizational structure of a growing document by 
showing certain levels of it. For example, to answer the 
person who just walked in and asked how the paper was 
going, I needed an outline of the current draft. It shows 
me the various parts and titles according to what it 
understands aboul the organization l am using. 
Sometimes I want to use this to examine the status of a 
document — how much is in the parts — and sometimes I 
want to use this to examine its organization — what Lhe 
parts are. 

in the second case of looking at the organization of a 
document, I also want to be able to reorganize the 
document by reorganizing the titles in a list. Selecting a 
line in the organization display also serves to select that 
location in the manuscript tile for the document. 

Show document status 

Show the level and title of the structural 
element currently being displayed. 

Show document structure 

Show the titles of the document structural 
elements. This command takes an argument 
for the depth to show. In one form, it shows 
the first few lines of each element; in another 
form, it reports the size of each element in lines 
(or whatever unit is requested). 

4.3. Document Management Aids 

Preparing a technical document is an information 
management task. The Document Editor contains several 
classes of functions for dealing with some of the 
specialized information and integration requirements of 
technical text. 

4.3.1 . Managing Cross References 

All technical documents require cross references. 
These are usually in sentences of the form "See Table 5", 
"See Section 5.4.7.2", or "See page 29". The Scribe text 
formatter provides a simple mechanism for indicating 
where you want a cross reference to point without having 
to specify literally its numeric contents. (For example, 
instead of saying "See Table 5". the manuscript says "See 
Table @relIldTable]" and that table itself says "I am the 
table whose reference label is IdTable".) This mechanism 
provides cross references in the output that are always 
correct, without any hand editing or checking, no matter 
what changes occur in the document. 

The luxury of this kind of facility leads to a number of 
problems. For one Lhing, you lend to use many cross 
references because they are "cheap". These many cross 


references are almost never checked by hand at any stage 
of the work. This is an instance of the relaxed vigilance 
that is fostered by having computers do their job. Because 
you know the cross reference is correct in the numeric 
sense, you stop checking that it applies. For example, it is 
common to change terminolgy and emphasis without 
remembering to check the references to make sure they 
still apply. 

An interactive utility is needed here to help with the 
very tedious job of locating the other end of a cross 
reference point. 

Follow CRF.F pointer 

Follow the cross reference pointer in this 
sentence and show the area of the manuscript 
that it points to in another area of Lhe screen. 
Find all fingers 

Find, one at a time, all the cross reference 
pointers in the manuscript that point to this 
particular spot. For example, we are in a table 
whose reference name is IdTable and we want 
to know how many other locations reference 
this spot and see what they have to say about it. 

4.3.2. Indexing 

Indexing a technical document is hard work. Many 
computer people believe that computers must surely have 
solved Lhe indexing problem and that indexes can be 
generated trivially by a macro that runs through the source 
text looking for keywords and putting them in an index. 
This is not how indexes of any practical use are generated. 

Indexing is a difficult task requiring skill, practice, and 
a large mental symbol table. Until natural language 
research solves the problems of semantics of natural 
language, we are stuck using humans rather than 
computers for generating indexes. Some interactive 
utilities make indexing a far more reasonable task. 

One requirement in indexing is having several 
different ways of entering a single reference to a single 
location. For example, suppose you want to index the 
concept "Stacking potatoes". This goes into the index in 
two forms: 

Potatoes, 
stacking, 49 
Stacking potatoes, 49 

It is reasonable for you to type in the basic phrase and for 
the computer to take care of the commands to generate the 
permutations. However, you don’t always want all the 
permutations. For example, suppose Lhe document 
contains a discussion of the distinction between 
Structuralism and Functionalism. You provide the index 
concept "Structuralism vs. Functionalism". The utility 
now offers the following possibilities, only some of which 
are really useful in an index. 

Functionalism vs. Structuralism, 121 
Structuralism, 


47 




vs. Functionalism, 121 
vs.. 

Functionalism, Structuralism, 121 

Entries like the third possibility do not normally appear in 
an index. Hence you need an interactive indexing utility 
both to weed out the ones that shouldn’t be there and to 
add any special cases that the algorithm did not generate. 

An index is basically a cross reference with a special 
format. Thus, the functions for cross reference 
maintenance are valid in indexing as well. 

Follow index pointer 

Given a particular index "symbol", show all 
the locations in the document that this symbol 
references. This is necessary for reducing the 
task of verifying validity of index entries to 
manageable proportions. This function is well- 
defined. 

Find all fingers 

Given a particular document "location", collect 
the index entries (hat point to it. In the Scribe 
world, where index commands are visible in 
the file, this command has limited utility (you 
can see them just by looking). A different 
meaning for the command emerges, however, 
by expanding the meaning for "location" 
beyond some small window. When location is 
a designated section or region of the document, 
the command is more useful, collecting the 
entries so that some judgment can be made 
about their overall adequacy. 

Make index entry 

Take a index phrase and show (in another 
viewing window) the potential entries that can 
be formed from that phrase. It shows the 
entries in the format that they would appear in 
the final document. You can select from this 
list the appropriate ones, add some to the list or 
designate "all". The utility inserts the 
appropriate indexing commands into the 
manuscript. 

Show index symbols 

Show the phrases that have already been used 
in the index, allowing you to search the list or 
abstract from it all the lines containing 
particular targets. 

Find word in index symbols 

bind all index phrases that contain a particular 
word. This replaces the index cards in a hand 
version of the task. 

Modify index entry 

Find all of the occurrences of a particular index 
phrase and change them and their 
permutations to use new wording. 


Analyze index 

Survey the "adequacy" of the index by 

comparing the size of the index to the size of 

the document. Create a frequency distribution 
of index entries by section to help in locating 
dead spaces that are inadequately covered in 
the index. 

4.3.3. Partial Output 

Most embedded command formatting programs 
accept a complete manuscript source and produce a 
complete document output. For large documents this is 
inefficient in both computer time and human time. Scribe 
has provided a partial solution through partial compilation 
of documents. In this scheme, a large document is 
produced from a set of input files. For example, a nine- 
chapter book typically consists of at leasL 11 files, one head 
file with definitions and front matter, nine files for nine 
chapters, and one for the appendixes. Each manuscript 
file indicates which larger document it is part of. Each 
part file can be run through the formatter separately while 
still maintaining its appearance as, and more important, its 
cross references to the rest of the document. 

In large documents, the part files themselves tend to 
be relatively large. The file system and various utilities' 
discourage having many files (several hundred, say) 
involved in a single document. So, for pragmatic reasons, 
it is necessary to have the option of excerpting and 
formatting parts of the source that are smaller than a 
whole file. This is important during initial development of 
a manuscript so that formatted files containing only new 
material can be produced and circulated to others on a 
project. This helps you keep better track of what has been 
produced in a given lime period and increases the chances 
that other project members will provide timely comments 
on what you wrote (by decreasing the volume they have to 
wade through). 

The Document Editor implemented a simple facility 
to aid with partial output requiements. It saved 
designated structural elements" of a document in their 
appropriate relative relations to each other. For example, 
the document in question consisted of 200 topic items in' 
alphabetical order. When a particular topic item was 
saved, it was inserted in a sorted file containing all the 
other items that had been saved on the same day. The file 
contained the Scribe commands for producing this 
excerpted document using the same format design as the 
whole document. This facility proved very useful. 

4.3.4. Annotation 

Keeping track of the information in a technical 
document is a large information management task. I am 
constantly looking for mechanisms to help me collect and 
survey information pertaining to a particular area of a 
document. That is, one needs a mechanism for associating 
with any alleged fact in a developing technical document 
the pedigree for the fact. This kind of information can 


48 


..idex by 
[he size of 
islribution 
n locating 
.overed in 


programs 
iroduce a 
nts this is 
ne. Scribe 
vnpilation 
-'ument is 
e, a nine- 
one head 
s for nine 
lanuscript 
of. Each 
tely while 
lortant, its 

■s tend to 
is utilities 
red, say) 
c reasons, 
'ting and 
:r tlian a 
ipmentof 
-'y new 
, on a 
nas been 
e chances 
jornments 
:y have to 

le facility 
It saved 
t in their 
example, 
items in 
item was 
ig all the 
. The file 
ring this 
gn as the 


technical 
sk. I am 
illect and 
irea of a 
isocialing 
iocument 
ition can 


range from date of addition to the manuscript to the 
specification for a collection of messages in which the 
project staff present their differing interpretations of the 
item under discussion. 

In an ideal system, the document could grow from 
functional specifications of the object being documented, 
using portions of the functional specifications as the 
original annotations in the manuscript. In this rosy view, 
the data base continues to evolve with changes in the 
product design and through continuing releases of the 
product. 

Most of you will have noticed that this proposal 
requires radically different tools from those now used for 
editing and formatting. One of the more modest ideas, 
however, is feasible for implementing with current 
systems. This is a utility that makes it easy to date portions 
of a file and indicate who provided the information that 
the manuscript portion describes. This is valuable during 
major revisions of documents, in either draft or 
maintenance phase. It could be used, for example, in 
determining where to place those infamous marginal 
markers known as change bars. 

Other kinds of annotation include things like "notes to 
self’, reminders about things to be added, descriptions of 
art work needed, or locations of originals to be pasted in. 
The major need is for mechanisms that make it easy to 
associate this kind of information with the appropriate 
places in the manuscript and to make it easy to survey or 
hide these annotations. 

4.3.5. Many Documents from One Source 

Documentation for a large system program rarely 
consists of a single huge manual (or shouldn’t). It is 
reasonable to want to produce a reference manual, a 
summary guide, a help file for online use, and a form of 
the reference manual stored online for use by a program 
that finds reference material for users. It is also 
reasonable, for maintenance reasons, to want to produce 
all these outputs from a single set of manuscript sources. 
An ideal preparation system would make this task easy. 

Some of the problems are similar to those of managing 
conditional compilation for software. The structure of 
well-written English makes it more difference difficult to 
include or exclude sentences than it is to include or 
exclude blocks from well-written code. As far as I know, 
tools to make this job easy remain an open research 
question. 

4.3.6. Maintaining a Document 

Document maintenance is perhaps a more difficult 
task than software maintenance. In both cases, when a 
certain feature changes, you need to find all the places 
where the feature appears and all cross references to that 
feature and make the appropriate revisions. In software 
systems, compiler, debugger, and cross-reference 
(concordance) tools make job fairly reasonable. In the 
documentation world, without a powerful text editor, the 


only way to find everything relevant is to read the whole 
draft. With a powerful text editor, the locator functions 
from indexing and cross referencing can be applied to the 
task. (Even then you have no guarantees of complete 
success.) 

In writing, as in programming, the more disciplined 
the structure of the original sources, the more 
maintainable the document. 


5. Conclusions 

Initial experience with the implementation indicates 
that the set of primitives chosen covers the application 
area reasonably well. The Praxis Language Reference 
Manual (a 300-page document) was produced under tight 
deadlines using an early version of the Document Editor 
and could not have been done as successfully without it. 

The system operates reasonably well but cannot 
address some problems at all, for example, the issues 
raised concerning annotation and maintenance. The 
limitations are imposed by the pragmatic necessity of 
using text files as tire representation of the documents. 
Further progress requires designing file formats for 
structured representation and building a file system using 
those formats. 

Acknowledgments 

Bill MacGregor, Richard Stallman, Pete Miller, and 
Graeme Williams commented on earlier drafts of the 
paper. Thanks to Earl Killian and Gene Ciccarelli for 
discussions and advice. Special thanks to Brian Reid for 
suggestions, enthusiasm, and creative disagreement. 


References 

1. Ciccarelli, E. C. Presentation Based User Interfaces. 
Ph.D. Thesis Proposal, MIT, 1981. 

2. DECSYSTEM-20 User's Guide. Digital Equipment 
Corporation, 1977. Order No. AA-4179B-TM. 

3. Habermann, A. Nico. An overview of the Gandalf 
project. Computer Science Research Review , Carnegie- 
Mellon University, 1979. 

4. Macdonald, N. H., Frase, L. T., & Keenan, S. A. 
Writer's Workbench: Computer Programs for Text 
Editing and Assessment. Bell Laboratories, 1980. 

5. Peterson, James L. "Computer Program for Detecting 
and Correcting Spelling Errors." Comm. ACM 23 , 12 
(Dec. 1980). 


49 



6. Reid, Brian K. Scribe: A Document Specification 
Language and its Compiler. Ph.D. Th„ Carnegie-Mellon 
University, Dec. 1980. 


7. Reid, Brian K. & Walker, Janet H. Scribe User 
Manual. 3rd Edition, Preliminary edition, UNILOGIC 
Ltd., 1980. 

8. Rude, R. V, & Mooers, C. D. The Hermes Message 
System. Integrated Communications Management. Bolt 
Beranek and Newman Inc., 10 Moulton Street, 
Cambridge, MA 02238. 1978. 

9. Sandewall, Erik. "Programming in an Interactive 
Environment: the LISP experience." Computing Surveys 
10. 1 (March 1978). 

10. Stallman, Richard M. EMACS: The Extensible 
Customizable Self-Documenting Display Editor. 
Conference Record, ACM SIGPLAN/SIGOA 
Symposium on Text Manipulation, Portland, Oregon, 
June, 1981. 

11. TEN EX Users’ Guide. Bolt Beranek and Newman 
Inc. 10 Moulton Street, Cambridge, MA 02238, 1977. 

12. Augmentation Research Center. TNLS User’s Guide. 
Stanford Research Institute, Menlo Park, CA 94025, 1973. 
ARC Journal 19200. 

13. Waters, R. C. The Programmer’s Apprentice: 
Knowledge Based Program Editing. Submitted to IEEE 
Trans, on Soft. Eng., 1981. 


50 


Checking for Spelling and Typographical 
Errors in Computer-Based Text 

Thomas N. Turba 
SPER^Y^UNIUAC 
Language Systems 
Roseville Development Center 


Abstract 

This paper addresses the problems and techniques of 
checking for spelling and typographical errors in 
computer-based text. To some extent, the paper is a 
Co iTi bmation o i a report o t work dene by the author and a 
survey of other work which, although not all used by the 
author, is of equal value and interest. Some of the material 
presented is related to other aspects of text processing such 
as data compaction and the efficient searching of very large 
dictionaries. 


1. Introduction 

Most of the examples presented here are related to English 
text though the techniques are equally applicable to other 
languages. English is, however, a good choice of a language 
for a spelling checker because it is almost universally 
recognized that English spelling is among the hardest in the 
world and that no one is really a complete master of it. This 
is in r *' i rt du° iv»i 


■ English is not a very phonetic language, in that often 
there is not a direct correspondence between the sound 
and spelling of a word. 

■ English has borrowed a large number of words from 
other languages which do not follow English conventions 
for spelling. 

■ English has multiple prefix and suffix forms that serve 
the same purpose and have only minor variations in 
spelling (for example, the prefixes en, in and suffixes 
able, idle and ance, ence, ince ). 

All of these problems lead to the fact that spelling in English 
is often incorrect. This brings us to the main subject of the 
paper, a tool for the automatic checking of computer-based 
text for spelling and typographical errors. 

2. Degree of Checking 

It would be best if a spelling checker would check for correct 
sentence structure, proper word use, and other factors in 
addition to correct spelling, much as we would expect a good 
proofreader to do. Complete proofreading, however, is a very 
optimistic goal because it would require a high degree of 
understanding on the part of the program. Although this has 

Permission to copy without fee all or part of this material is 
granted provided that the copies are not made or distributed 
for direct commercial advantage, the ACM copyright notice 
and the title of the publication and its date appear, and notice 
is given that copying is by permission of the Association for 
Computing Machinery. To copy otherwise, or to republish, 
requires a fee and/or specific permission. 

® 1981 ACM 0-89791-043-5/81/0600/0051 S00.75 


been done for limited vocabularies with limited sentence 
structures, to the best of the author's knowledge, it has not 
yet been done in the general case across a large vocabulary 
with a large number of sentence structures as is normally 
tound in most text processing applications. 

The inability to do as precise checking as might be desired 
should not discourage us. A considerable number of errors 
can be found by doing checking on the word level and leaving 
the larger context for proofreaders. Many of the errors 
caught by computer analysis on the word level will, in fact, 
be errors that can easily be missed even by a good 
proofreader. This is because the human mind, even when 
explicitly looking for errors, will often subconsciously fill in 
and modify the characters that are seen so that sense can be 
made out of what is being read. 


3. Word Composition 

If we start on the premise that checking will be done on the 
word level, we must first define what a word is. Generally 
we think of a word as being composed of the alphabetic 
characters a...z and their uppercase equivalents A...Z . 
Words, however, may contain characters such as a hyphen 
(-), apostrophe O, or accent mark ( ' ). In addition, foreign 
languages contain other alphabetic characters such as Ti, e, p, 
e, and many others. If the text is describing a programming 
language, the rules for forming words (or basic lexical units) 
may indicate including digits (0...9), underscore (_), dollar 
sign ($), and other characters in addition to those normally 
allowed in forming a word. 

In addition to determining what a word is, we must also 
address the question of what is not a word, or more precisely, 
what terminates a word. Although this may seem obvious 
(because anything that is not in a word must terminate a 
word) there are times when it is not always that simple. For 
example, does the physical end of a line indicate the end of 
a word? Normally it will. However, if text has been 
formatted by a typist, it may not when a word is hyphenated 
at the end of a line. These rules will also change for text 
that is input for a documentation processor, since the rules 
for forming a word must conform to the rules in the 
documentation processor. This brings us to another point, 
because text for a document processor normally consists of 
text and composition commands. These commands often look 
like words; however, they should not be treated as such. 
What this boils down to is that a set of rules must be set forth 
indicating what constitutes a word to be checked. This can 
be built into a lexical analyzer that will recognize words to 
be checked. This analyzer will correspond to the rules for 
recognition, and may actually be several interchangeable sets 
of rules. 

If interchangeable sets of rules are employed, a single 
spelling checker to he used with various types of text such 
as for different document processors or text that is already 
formatted. In addition, it permits more than one set of rules 
for a given type of text. Therefore, the user can choose 


51 




whether or not words are to be broken at a hyphen or 
apostrophe, or whether other recognition options are to be 
used The way in which these rules are encoded will depend 
on the tools available to the implementor. However, it 
cannot be overstated that the use of an automated tool such 
as [Tl] (which was used by the author) should be taken 
advantage of when available. Such tools can not only 
produce a more flexible implementation but can also improve 
efficiency if implemented as a finite state machine. 

In addition to being able to specify the rules for word 
composition, the user should also be able to control what type 
of words are to be checked. For example, it may be desirable 
to exclude from checking words which correspond to variable 
names in a programming language, or character groupings 
such as 12th, 23rd, etc. These options should be alterable 
from execution to execution so the user can choose the degree 
of checking desired for each document. 


4. Checking Methods 

Several methods have been used to check for spelling and 
typographical errors. Each method has its own advantages 
and disadvantages. The methods are described separately, 
though they often have been combined in actual use. 


4.1 Statistical Analysis 

This method is based on a frequency analysis of adjacent 
characters in words to be checked. This is done by having 
a sample text from which frequency counts are derived for 
fixed length character groupings such as digraphs or 
trigraphs (two or three character groupings). Once a 
frequency table has been built, it can be used to check text. 
Words which have an abnormal, or low, frequency profile can 
be flagged as possible misspellings. The primary advantage 
of this method is that it attempts to reduce the amount of 
storage needed to verify words while at the same time it can 
handle a large number of words. This advantage, however, 
is far outweighed by the fact that unless a reasonably large 
dictionary is used in conjunction with the frequency tables, 
a large number of valid words will be flagged as possible 
errors. In addition, a considerable number of incorrect words 
will not be detected. For these reasons, this method is rarely 
used today though the use of frequency analysis still has 
application in other areas such as cryptanalysis. People 
interested in further information on this type of checking 
should consult [Ml] and [Wl], 


4.2. Stripping 

This method consists of stripping off valid prefixes and 
suffixes until the word has been reduced to its root, which 
is then checked in a dictionary. This approach has been used 
to a greater or lesser degree in a number of spelling checkers 
such as [Gl, 1)1. Z 1 ]. This method has the advantage that 
a small dictionary can be used to check a large number of 
variations on the same word. Therefore, these variations 
need not be represented explicitly but can still be recognized. 
hi audition, this method has the advantage that new 
combinations of prefixes and suffixes can be recognized 
without a change to the spelling checker because any prefix 
or suffix can normally be combined with any root word. This 
flexibility, however, is one of the downfalls as well as one of 
the advantages of this type of checking. This is because it 
is not a sufficient criterion for a word to be correctly spelled 
simply because it is made up of a valid prefix, root, and 
suffix. At first, this may seem incongruous; however. Table 
4-1 sheds a little light on the problems involved. 


Table 4-1. Incorrect Word Joining 


Incorrect 

Correct 

Reason for Error 

disspelled 

dispelled 

Invalid prefix for word 

criterias 

criteria 

Inappropriate suffix for wcrd 

occurrance 

occurrence 

Incorrect suffix for word 

inmature 

immature 

Invalid prefix for word 

beginer 

beginner 

Incorrectly joined suffix 

ingrave 

engrave 

Invalid prefix for word 

selecter 

selector 

Incorrect suffix for word 

compatable 

compatible 

Incorrect suffix for word 

queueing 

queuing 

Incorrectly joined suffix 


As can be seen in Table 4-1, the use of prefix and suffix 
stripping can lead to missing errors in the text that is being 
checked (some of which are very frequent errors). Because 
the main object of a spelling checker is to catch errors, 
rather than to verify correct word form, it should be realized 
that there is a definite tradeoff in the use of stripping. This, 
however, should not discourage its use when appropriate (as 
for example, when dealing with medical or chemical terms 
which follow a more consistent set of rules than does more 
general English). These areas have a large set of valid words 
that are built up in a regular fashion so the savings on 
dictionary space can be significant. Many of the problems of 
stripping can be mitigated by only using reliable prefixes and 
suffixes or by employing a set of rules to control stripping 
that are based on the language. This technique was 
employed in the Stanford spelling checker [Gl] and is 
described in [PI] and [P2], 


4.3. Complete Lookup 

This method consists of having a large dictionary which 
contains all forms of the words which are to be considered 
valid. It has the advantage that only those words which are 
considered valid will be accepted and all others will be 
flagged. Therefore, checking can be tailored to as small or 
as large a set of words as is desired. Such a capability can 
be used to advantage because many frequently used words 
have similar, equally-valid, application-specific words which 
could accidently be mistyped or misspelled in their place. 
This selectivity can be used to advantage in other ways. For 
example, it can be used to enforce a limited vocabulary on 
documents intended for a restricted class of readers (such as 
foreign military personnel, a given grade level of students, 
etc.). This selectivity, however, has its price because each 
variation of a word must be explicitly represented and any 
variation, no matter how slight, that is not in the dictionary 
will be flagged as a possible error. 

A variation on complete lookup can be accomplished without 
actually having all the various word forms in the dictionary. 
This can be done by having associated with each root word 
information indicating which prefixes and suffixes can be 
joined to the word. Such a representation can reduce the size 
of a dictionary but is traded off for increased complexity and 
lookup time. It does not effect the number of actual words 
that can be recognized as does true stripping. 


52 




i suffix 
s being 
because 
errors, 
ealized 
. This, 
iate (as 
I terms 
s more 
i words 
ngs on 
iems of 
’ and 
ing 
was 
and is 


which 
sidered 
ich are 
will be 
mall or 
ity can 
1 words 
5 which 
' place, 
ys. For 
lary on 
such as 
udents, 
se each 
nd any 
tionary 


without 
ionary. 
ot word 
can be 
the size 
ity and 
1 words 


5. Dictionary Representation 

All spelling checkers have a dictionary of one type or 
another. In fact, most will have several types of dictionaries, 
both static and dynamic. In this section, some of the more 
common forms that have been used in the past will be 
explored. Indication is given as to their appropriate use and 
special considerations. This section should not be considered 
exhaustive and complete. Readers interested in further 
information on searching, such as might be used for a 
dictionary, should consult [Kl] and other similar sources. 

5.1. Sequential Lists 

Probably the simplest form of a dictionary is a sequential list 
where all words are arranged as they would be in a 
dictionary. Conceptually such a list is just a series of words 
such as: a~aardvark~aback~abacus~abalone~abandon~... 
Such a list can be compactly represented in storage by having 
the characters for each word appear adjacent to the 
characters for the next word with only a special character 
separating them. Such a representation, however, would be 
inefficient to search if the list is very long since each 
character must be looked at. Some of this inefficiency can 
be removed by having indexes into the list indicating where 
to begin searching. Such an index is the equivalent of the 
index tabs on a dictionary indicating where words are found 
that begin with a, b, c , etc. This technique, however, has 
its limits since sequential searching must still be done. An 
alternative to using an index is to make each of the words 
in the list of equal length. This can be done by padding 
words out with spaces or some other inert character. Once 
this is done, words in the list can easily and efficiently be 
found by a binary search. Adding the padding characters, 
however, has its cost since space for representing words is 
significantly increased and this increase is normally too 
large to be acceptable unless the list is on backing storage. 
A third alternative for a sequential list is to have associated 
with each word a field indicating where the next word is 
located. This field can be a character count for the word, an 
offset to the next word, or a pointer (making each word an 
entry in a linked list). Therefore, due to the efficiency 
considerations, sequential lists are generally not used in 
spelling checkers except for short lists, or lists that are 
combined with some other method of access. 


5.2. Tree Structures 

A tree structure can be an appropriate way to represent both 
a static unaltering dictionary and a dynamic dictionary that 
changes (grows) during the execution of a spelling checker. 
There are many types of tree structures. These variation's, 
however, will not be addressed here. Interested readers 
should consult [Kl]. For the purpose of this article, trees 
will be considered to be of two types. Those which contain 
a complete word in a node or entry in the tree, and those that 
contain a single character in a node. The difference between 
these two types of trees can be seen in Figures 5-1 and 5-2 
which contain the words the, then, and this . In these 
figures, A indicates a null or void link within the tree. 


5.2.1. Word Node Trees 

Nodes in a tree which contains full words are shown here to 
have three fields: a left link (which points to words lower 
in the collating sequence), a right link (which points to words 
higher in the collating sequence), and a value field (which is 
the word associated with the node). The nodes can be 


considered to be variable in length depending on the size 
needed to represent the word. In actual practice, the nodes 
would contain a length field for the word and, very often, 
backward links to aid in traversing the tree. 



Figure 5-1. Full Word Node Tree 

This type of structure does not need extra space for padding; 
however, it does need space for the links which can be either 
pointers or offsets. In addition, a considerable amount of 
time can be spent running links even on a well balanced tree 
if a large number of entries are in the tree. For this reason, 
a tree structure such as this is generally not well suited for 
a large dictionary. It is, however, a reasonable choice for 
keeping track of words not found in a large dictionary (such 
as the list of possible misspellings that are found when 
checking a document). This type of structure is well suited 
for such purposes because new entries can easily be added to 
the tree. 


5.2.2. Character Node Trees 

Nodes in a tree which contain only an individual character 
at each node are represented here as having five fields. 
These fields consist of: left and right links for sorting as 
before, a character field for the character at this position, a 
flag field indicating whether this character marks the end of 
a word, and a link field which leads to further words 
beginning with this character sequence. The tree is searched 
by comparing the first character in the word to be checked 
against the character in the first node in the tree. If they 
do not match, either the left or right link is taken in an 
attempt to find a matching character. If the character from 
the word matches the character in the node, further 
matching (if any) will be done by proceeding down the center 
link in the node and looking at the next character in that 
node. This process would continue until the word is 
completely matched or a failure is indicated. 



Figure 5-2. Character Node tree 

As can be surmised, this type of tree structure is also not the 
best choice for a large dictionary because considerable time 
would be spent running links. In addition, the links 
themselves would take nn a considerable amount of storage. 
Such a tree structure does, however, have its place in a 
spelling checker because it is a good structure for 
representing prefix and suffix lists which are not that large 




□ 

□ 

□ 

end 

□ 



53 












I 



and are normally used character by character for matching. 
Such a tree can be dynamically initialized by reading in the 
prefix and suffix lists, but more likely, such a tree would be 
built into the program by use of a macro assembler or table 
builder. In such a case, the links would be represented as 
offsets into a table. 


5.3. Hash Tables 

A hash table is a random access device for looking up a word 
by a key. or code value, derived from the characters within 
the word. These tables are well suited for holding a 
dictionary of words that increases in size as execution 
proceeds. As with trees, there are many variations on hash 
tables. For reasons of brevity, only two types will be shown 
here. Readers who are interested in further information 
should consult [Kl], 


5.3.1. Partial Hashing 

The structure of a hash table is based on the fact that by 
using characteristics of a word a mapping can be defined for 
words onto a restricted set of integral numbers that can be 
used as keys (indexesl for locating the words. Such a 
mapping can be based on word length and beginning 
characters (as was done in the Stanford spelling checker 
[GI]I or on ail or most of the characters in a word. This 
mapping can be done because characters within a word can 
be interpreted as a series of integer values that can be 
combined to produce a unique, or semi-unique, value 
representing the word. This value can then be used as the 
index to locate and verify the word. Such a value can be 
produced by adding and multiplying (shifting) the values for 
individual characters (or groups of characters found in a 
memory cell) and then further randomizing the result by 
division. The method used by the author consisted of adding 
the value of each character to a machine word and circularly 
shifting the result after each addition. After all characters 
had been added, the result was divided by three to further 
randomize the lower portion of the word. This was done so 
that the lower n bits could be extracted as a hash code for 
indexing into a hash table which had 2" entries. If the hash 
table had a prime number of entries (such as 509, 1009, 2003, 
etc.) the hash code could be obtained as the residue after 
dividing the value in the word by the prime number. The 
hash code, derived by whatever method, is used as an index 
into the hash table as shown in Figure 5-3. In this table, 
there are n entries accessed by hash codes from zero to n -1 
Each entry in the table has nodes chained off it for each word 
which can be accessed, or found, by the hash code associated 
with the table entry. As before, null links in the table and 
nodes are represented by X. 

Hash Table Word Entries 



Figure 5-3 Hash Table With Chained Nodes 


Words are accessed by determining the hash code, looking in 
the corresponding position in the hash table, and following 
the pointer found in the hash table (and further pointers in 
the nodes chained off the table) until either the word is found 
or a null link is encountered indicating that the word is not 
in the table. A word can easily be added to a hash table once 
it is determined that the word is not present. This can be 
done by adding the word either at the beginning or end of 
the list of words for an entry. The advantage of a hash table 
is that words can be accessed by means of the hash code, 
uiereoy eliminating a large number of comparisons as would 
be needed in searching a sequential list or tree structure 
The number of comparisons eliminated is directly related to 
the size of the table, the number of words contained in the 
table, and how random the hash code generation is. As can 
be surmised, if a particularly good hashing function is used 
with a very large table, only one comparison would be 
needed. 


5.3.2. Total Hashing 

If a sufficiently good function is found and the table is made 
suitably large, the need for actually having the words in the 
table can be questioned if the table is only used for looking 
up words and the character representation of a word is not 
needed (such as for the main dictionary of words). In place 
of having a variable length word at each entry in the table, 
a verification code derived by applying a different hash 
function on the word can be used as a check. The structure 
for such a dictionary is shown in Figure 5-4. In normal use, 
the check code will only be one bit indicating if a word 
corresponds to this position. (The reason for using only one 
bit is that other bits can be traded for an increased address 
space unless access to an individual bit is too expensive.) The 
advantage of such a table is that the size of a static 
dictionary can be considerably reduced. Such a dictionary is 
normally built by a special processor that will hash each 
word and put its verification coda in the table while checking 
to assure that no two words occupy the same position. If 
more than one word maps to a given position, a new hashing 
function is tried until one is found that does not produce such 
a collision. The reduction in dictionary size offered by this 
method has its cost, because some incorrect words can 
mistakenly match the entries for correct words. This should 
not occur frequently if a suitably large hash table is used. 
However, it should be remembered that the upper bound on 
possible misspellings is only somewhat less than infinite, and 
some errors will go undetected. Whether these are errors 
that people would actually make is an unanswered question. 
(Tests would need to be done employing the actual hash 
functions chosen.) However, since the object of the 
dictionary in a spelling checker is not only to find correct 
words, but also, to assure that incorrect words are not found, 
this method should only be used when memory constraints 
dictate its use. 


Check 

0 

Check 

1 

Check 

2 l 


Check 

n -2 

Check 

n-i 

0 

l 

2 

Indexes 

n-2 

n-i 


Figure 5-4. Total Hash Table 


More advanced types of total hashing that have a higher 
degree of reliability have been used in the Chemical 
Abstracts [Zl] and UNIX [Ul] spelling checkers. The 
hashing techniques are based on the work described in [Bl] 
and [C2], These techniques consist of having a hash table of 
one bit entries and r independent hash functions. A word 
is entered in the table by setting r bits corresponding to the 


54 









• ng in 
fol iowi ng 
'inters in 
' is found 
rd is not 
ible once 
s can be 
r end of 
ish table 
sh code, 
.is would 
ructure. 
dated to 
d in the 
As can 
i is used 
ould be 


is made 
Is in the 
looking 
d is not 
In place 
•e table, 
at hash 
‘.ructure 
■nal use, 
a word 
nly one 
address 
e.) The 
1 static 
**y is 
ach 
necking 
ion. If 
hashing 
-ice such 
by this 
rds can 
* should 
■s used, 
•und on 
ite, and 
errors 
jestion. 
il hash 
of the 
correct 
t found, 
straints 


'heck 

n-i 


higher 
lemical 
The 
in [Bl] 
able of 
A word 
: to the 


r values for the functions. A word is tested by assuring that 
each of the r bits is set. Some bits may be set by more than 
one word; however, this need not effect the overall reliability 
which can be quite high. (In the UNIX spelling checker, the 
probability of accepting a random misspelled string as 
correct was held to approximately 1 in 2,000 - which is 
considerably less than that introduced by unchecked prefix 
and suffix stripping.) This type of hash table is depicted in 
Figure 5-5. 

Bits set for have 

1 z 

01001101 1010 

f 1 T 

Bits set for had 
Figure 5-5. Dispersed Hash Tabie 


5.4. Length-Segmented Lists 

One type of structure eminently suited for use as a static 
dictionary structure is a length-segmented list. As 
previously indicated, a sequential list of words can be very 
efficient, in terms of storage used, if the words are placed 
next to each other with only a separator character. The 
problem of such a structure is that it is inefficient to search 
even if a set of indexes are used with it. A better way to 
resolve the efficiency problem is to restructure the set of 
words so that they can be efficiently accessed. One such way 
to restructure the dictionary is to order it first by word 
length and then by collating sequence within each word 
length. This can be thought of as breaking the dictionary up 
into a set of volumes where the first (a very trivial volume) 
contains words of length one, the second words of length two, 
the third words of length three, etc. Such a structure can 
be visualized as in Figure 5-6 which also shows a length 
index table into the structure. 


Index Table Segments 



Figure 5-6. Length-Segmented List 

Words within each length segment can be found by using the 
index table to determine the starting and ending positions for 
a length segment and then using a binary search to find the 
word. This type of search will be efficient if the dictionary 
is in main storage. If the dictionary is on backing storage, 
it is desirable to have a more refined indexing scheme that 
can point directly into the various parts of each length 
segment. This is desirable because it will cut down on the 
number of read operations necessary to find a word. Such 
an index can consist of the first one, two, or three characters 
for words of a given length. The index table can be ordered 
by length or beginning character sequence as the primary 
item. Figure 5-7 shows a table with the leading character 
as the primary index. The table depicts the general type of 
structure needed for accessing the length segments. In 
actual use, a greater refinement in locating length-segment 


positions on secondary storage (by use of more characters 
from the word) would be employed. In addition, each length 
sub-table would contain an entry indicating the maximum 
length word for the prefix (thereby reducing the size of the 
sub-table needed). In a like manner, if indexing was to be 
done with length as the primary (first) index, each 
character-prefix sub-table would have a range associated 
with it. 


Character Length 



Figure 5-7. Two Level Index Table 


The index table can be thought of as a hash table where the 
hash code is based on the word length and beginning 
character sequence. The table will normally be built by a 
special processor that will also build the length segments on 
secondary storage. Once built, the index table will normally 
reside in main storage as a static structure. As can be 
expected, the greater the refinement in this table, the more 
space it will take up. By proper structuring of such an index 
table, even a very large dictionary contained on secondary 
storage can be efficiently accessed. (Tests were done on a 
dictionary of approximately 1/4 million words using a 3-level 
digraph index. Access to most words could be done with only 
one or two I/O operations. Words, however, which began 
with common prefixes such as un or re took longer.) 
Performance of this method will, of course, depend on I/O 
buffer size, index length, and other factors. 

As with other dictionary structures, a length-segmented list 
can be used for more than just checking the spelling of words. 
Each word in the list can have associated with it information 
such ac hyphenation points, meanings, valid suffixes, etc. 
This could be done by having extra space allocated in front 
of each word to store the information. However, a more 
efficient representation is to have a corresponding (parallel) 
table that contains an information entry for each word in the 
segmented list. In this way, unnecessary information is not 
read in while searching for the correct word. 

5.5. Compact Representations 

As may have already been noted, there is considerable 
interest in having a very compact dictionary. This is 
especially true when working on a small computer such as is 
often found in an automated office environment. This need 
has given rise to the use of prefix and suffix stripping as well 


55 


I 


as to the creation of totally hashed dictionaries. These 
methods, however, have their tradeoffs in reduction of 
dictionary space versus accuracy. Another method that can 
be used without loss of accuracy is to take advantage of the 
fact that a large number of the characters within a given 
character set are not used for words and will never appear 
in the dictionary. Therefore, a more compact character set 
can be used to store words in a dictionary. Such an approach 
was used in the Stanford spelling checker [Gl] and is 
applicable to most machines. If the distinction between 
uppercase and lowercase characters is to be preserved, 
characters can be represented as integer values in the range 
of 1-52. These values will fit in a 6-hit field with some 
values left over for other characters. If the distinction 
between uppercase and lowercase characters is not to be 
checked, a 5-bit field will suffice. 


6. Obtaining a Dictionary 

The most difficult task in developing a spelling checker is 
obtaining the right dictionary. At first, it may appear that 
there should be an easy way to obtain a dictionary, namely, 
buy one from a company that produces dictionaries. This, 
however, has several problems: 


■ Most companies that produce dictionaries are reluctant 
to sell a machine readable copy of a dictionary even if 
it is stripped of all meanings and consists only of 
headwords. Some companies now sell such dictionaries 
although the price is substantial and normally includes 
a royalty clause for products based on the dictionary. 

■ If such a dictionary can be obtained, it will not contain 
all forms of a word. Most notable of the words missing 
will be plurals, but a iarge number of other forms and 
application-specific terms will be missing. 

■ A large dictionary from a publisher will contain many 
archaic and esoteric words which would be recognized as 
valid words rather than the misspellings which they 
normally would be. (Some companies now offer more 
restricted application-oriented dictionaries that are 
more suitable.) 


■ Most dictionaries, even if they are for general use, will 
contain a large number of application-specific technical 
words which can often be confused with more common 
words of a similar spelling. For example: 


Technical Word 
synapsis 
tensor 
laver 
quark 
serine 


Common Word 
synopsis 
tenser 
lever 
quirk 
serene 


■ A general dictionary will contain many similar words 
that are not suitable for some applications. For 
example, in a business environment cheap would 
probably be an acceptable word, whereas, cheep would 
not. 


Odd as it may seem, even a dictionary obtained from a 
publisher will often contain some spelling errors, 
although the percentage of errors is normally quite 
small and is orders of magnitude less than in a 
dictionary obtained by other means. 


These problems should not stop a person from trying to 
obtain a dictionary. If such a dictionary can be obtained, it 


can be a valuable aid in verifying words even if not used as 
a dictionary itself. Verification can be done by producing the 
logical intersection of a word list and a dictionary. In this 
way, word candidates can be checked against a known source 
and considerably ease the process of checking words. This 
approach was used in the UNIX spelling checker, the latest 
spelling checker built by the author, and undoubtedly many 
others. 

If a dictionary is not available, the process of word 
verification is a little more difficult although not impossible. 
An approach that can be taken is to use frequency counts as 
an aid in word verification. If a word appears frequently in 
a document, it is probably correctly (or at least consistently) 
spelled. If the word also appears in other unrelated 
documents with a similar frequency, its probability of being 
correctly spelled is considerably higher. A suitable value for 
a frequency count that is sufficiently high to give a 
reasonable assurance of correct spelling will depend on the 
document size and may need to be determined by 
experimentation. Ten, however, is a reasonable value for an 
arbitrary choice. After frequency counts have been obtained 
for 10 or 20 documents, those words which are common to at 
least half of the documents car. be culled out. These words 
should all be correctly spelled (unless the same poor speller 
wrote all the documents). The words obtained should be 
scanned for any possible errors. After any have been 
removed, the words can be used as an initial dictionary. 
After the initial word list has been obtained, the frequency 
threshold can be lowered and another list of words obtained. 
This list will need to be more carefully examined for errors, 
but will probably consist mostly of valid words. After these 
words have been checked, they too can be added to the 
growing dictionary list. After a reasonable number of words 
have been obtained (2,000 for example), it is best to use the 
words as a base dictionary and begin actually checking 
documents. After each document has been checked, correct 
words from it may be added to the dictionary being formed. 

The above method was the one originally used by the author. 
Another method which was used in producing the dictionary 
for the Chemical Abstracts spelling checker [Zl] is a good 
alternative if access to a large data base of proofread 
documents is possible. This consists of performing a 
frequency analysis of all the documents as a whole and using 
the results, or part of the results, as the basis for forming 
a dictionary. Again the list must be scrutinized because even 
documents that have received extensive proofreading will 
contain errors. (An informal survey by the author of a 
limited number of existing data bases for proofread and 
published documents found a considerable number of errors, 
some as great as one per page.) Most often, errors in a 
proofread document will be words with a low frequency 
count. Words which appear once should be especially 
suspect. Words with the highest frequency count should be 
the most reliable although some may be abbreviations or 
acronyms which it might be desirable to exclude from the 
dictionary. 

In the early sixties, a word frequency study of English text 
was done at Brown University. The survey was done on a 
diverse set of texts from different disciplines and sources 
(newspapers, books, technical journals, etc.). The objective of 
the study was to analyze words as they appeared in actual 
text and, therefore, they strived to preserve the words as 
they appeared in the original text. The results of this study 
were reported in [K2], and computer tapes of the survey are 
still available. However, the survey was not done to collect 
words for the dictionary cf a spelling checker. Therefore, not 
all of the information is applicable to this use and the survey 
does contain spelling and typographical errors as they 


56 



.sed as 
lucing the 
In this 
•vn source 
ds. This 
the latest 
dly many 


of word 
ipossible. 
-ounts as 
uently in 
sistently ) 
unrelated 
of being 
value for 
give a 
d on the 
i.ned by 
Je for an 
obtained 
non to at 
se words 
r speller 
nould be 
ve been 
.'tionary. 
equency 
)btained. 
r errors, 
er these 
1 to the 
of words 
use the 
.hecking 
rrect 
med. 

• author, 
ctionary 
s a good 
roofread 
ming a 
nd using 
forming 
lse even 
ng will 
or of a 
■ad and 
errors, 

>rs in a 
equency 
pecially 
lould be 
ions or 
'om the 


ish text 
ne on a 
sources 
ctive of 
actual 
ords as 
s study 
vey are 
collect 
)re, not 
survey 
s they 


appeared in the study text. Likewise, because the survey was 
done on text from a diverse set of disciplines, it contains a 
number of valid words which, because of their unique nature, 
might not be appropriate for use in most spelling checkers. 
Nonetheless, this (and similar surveys) are a valuable source 
of information and should not be overlooked. 

The best dictionary a spelling checker can have is one 
derived from the type of documents it actually will check. 
Such a dictionary will not have excess garbage words that 
will not be used and can be mistaken for errors in the text. 
Therefore, it is highly recommended that the extra work of 
a frequency analysis be done when forming a dictionary. It 
should be noted, however, that if one relies on the documents 
at hand, and they are limited in scope, the dictionary will 
also be limited. If this is the case, it may be desirable to 
selectively augment the dictionary with words obtained from 
a broader base of text. 

If stripping is being done in the spelling checker, more work 
will need to be done with the words collected, since the 
prefixes and suffixes will need to be stripped off and only the 
root word kept for insertion in the dictionary. If prefix and 
suffix stripping is not being incorporated in the spelling 
checker, it may be desirable to build a special processor to 
do stripping and check the root word as an aid to word 
verification. However, as noted before, it is not a sufficient 
test for a word to be correctly spelled because it has a correct 
prefix, root, and suffix. 

7. Analysis of Word Usage 

The characteristics of how words are used in a document can 
have an effect on the design of a spelling checker. For 
example, characteristics that relate to word frequency, word 
length, and word distributions can influence how a spelling 
checker is constructed. Figures 7-1 and 7-2 show some of the 
results obtained from a study done by the author on a set of 
23 documents with a total of over 400,000 words. The 
documents were a selection of memos, letters, papers, 
summaries, and reference manuals that were readily 
available in machine readable form. The documents were 
related to the production, management, and description of 
computer software. In that sense, and due to the small 
sample size, the study is biased. The results, however, 
correlate with other studies such as [Cl] and [K2]. For the 
purpose of this study, words were broken at hyphens and 
apostrophes, words which contained dollar signs, 
underscores, or digits were ignored, and uppercase and 
lowercase characters were considered to be identical. This 
was done to ensure that unique words represented in the 
study would approximately correspond to words that would 
be contained in a dictionary for a spelling checker. It is not 
a direct correspondence because some words such as 
abbreviations and acronyms may be specific to an individual 
document and may. therefore, not be appropriate for a 
dictionary. In addition, it may be desirable to preserve the 
capitalization on some words which are to be contained in the 
dictionary. 

As can be seen in Figure 7-1, the largest proportion of words 
found in a document tend to be small words. This, however, 
should not be a surprise, as it is basically a confirmation of 
Zipf's law which states that frequently used words will tend 
to be smaller. This fact can be taken advantage of in the 
design of a spelling checker because a small compact 
dictionary can be used to recognize the majority of the words 
in a document. Figure 7-2 shows another interesting aspect 
of word usage that can be taken advantage of in the design 
of a spelling checker. 



Figure 7-1. Word Length Versus Frequency 
3k 

-o 

O i 

■£ • 

£ 2k 


c 



a> 

X3 


E 

3 •• 

2 ' 

0 t— 

0 10k 20k 30k 40k 50k 60k 

Total Number of Words in Document 

Figure 7-2. Number of Words in a Document Versus Unique Words 

As can be seen in Figure 7-2, the number of words used in 
a given document is not that large and does not grow in 
direct relationship to the document size. This also should not 
be too surprising, because a document is normally related to 
a specific topic, and therefore, its terminology should be 
either general words or words specific to that topic. Another 
reason that the number of unique words in a document does 
not grow in proportion to its size is that people tend to use 
the same words over and over (sometimes almost beating 
them to death). Therefore, when an obscure or unique word 
is found in a document, the probability of it occurring again 
is greatly increased. However, the probability for other 
words not yet found is not increased. This fact can be taken 
advantage of in the design of a spelling checker by 
incorporating into the checker a cache memory so that 
previously seen words can be readily accessed. The cache can 
have words added to it. when needed by accessing a much 
larger dictionary on secondary storage. This larger 
dictionary can have a slower access time without 
significantly affecting performance. 

8. Building a Spelling Checker 

Taking the information already presented, we are now at the 
point where we can put it together to lorm a spelling checker. 
The basic structure of a spelling checker is shown in Figure 
8-1. This structure is similar in most respects to one built 
by the author. 


57 


Document to be Checked 



The various pieces of the spelling checker are: 


Lexical Analyzers 

The boxes ’abeled Lex 1, Lex 2, etc. are lexical analyzers 
that correspond to different sets of rules for recognizing 
and forming words in the document. Each analyzer has 
a slightly different set of rules as to what constitutes 
text, punctuation, and documentation processor 
commands. The output of the lexical analyzers consists 
of words that are passed on to the word checker. 

Word Checker 

The word checker takes each word, performs any prefix 
or suffix stripping (if this is being done), and checks to 
see if the word is in the high frequency or cache 
dictionary. If it is not, it will be considered an error. 
If prefix and suffix checking is being done, the process 
of checking could take several iterations as different 
forms are tried. 


Stripping Dictionary 

This dictionary, if it exists, contains prefixes and 
suffixes for stripping. As such, it would actually be two 
dictionaries, one for each purpose. It can efficiently be 
represented by two tree structures where each node 
contains a single character and links to other nodes in 
the tree. 

The structure shown here should not be considered as the 
only or final form of a spelling checker. There are a number 
of alternatives and enhancements that can be considered and 
implemented if desired. A few of these include: 

Private Dictionary 

It is often desirable to have a private dictionary which 
can be used either with an individual document or a 
group of documents. S"ch a dictionary can greatly 
simplify the process of rechecking a document because 
unique words and abbreviations can be accounted for. In 
addition, a private dictionary can be used to supply 
words that have not yet been added to the master 
dictionary. Such a dictionary is normally in the form 
of symbolic text that can be dynamically read into the 
spelling checker. The words can be placed in the cache 
or a similar structure dedicated for this purpose. 

Override Dictionary 

As well as being able to add words to the dictionaries, 
it is also desirable to be able to override words that are 
in the dictionaries. Override words can also be read into 
the cache with other words and marked as such. This 
capability may not be necessary if the user can easily 
rebuild the dictionaries used with the spelling checker. 

Word Output List 

If a private dictionary capability is provided, it is often 
desirable to provide a mechanism whereby words flagged 
by the spelling checker can be collected as output in the 
form of symbolic text. Such a word list can then be 
edited and combined with other such lists to form a 
private dictionary. 

These are only a few of the additional possible features that 
can be added to a spelling checker. Users will undoubtedly 
find many others that are based on their own unique needs. 

9. Checking Beyond the Word Level 


Master Dictionary 

This dictionary is normally a large dictionary on 
secondary storage that is in a special form (such as a 
length- segmented list) for fast access by the spelling 
checker. 

Cache 

The cache is a random access list of words that are 
brought in from the master dictionary. It will normally 
be implemented as a hash table, but also could be a tree 
containing word entries. 

High Frequency Dictionary 

This dictionary is built into the spelling checker and 
consists of a list of several thousand of the most 
frequently used words. The list should be derived from 
a study done on word frequencies on text similar to that 
which is expected to be processed. All words that appear 
in this dictionary should also appear in the master 
dictionary (in case the high frequency dictionary is 
changed). This dictionary is used to improve 
performance by avoiding a search of the master 
dictionary for the most frequently used words. 


As indicated earlier, it would be best if a spelling checker 
could do checking beyond the word level. In the future, more 
comprehensive spelling checkers will no doubt do more in 
this area. For today, however, there are some checks which 
can be made by the addition of a simple memory to the 
checking process. This was done in the checker implemented 
by the author to catch the frequent typographical error of 
repeating a word next to itself. Such an error normally 
occurs when a writer is momentarily interrupted and the 
train of thought must be resumed. Such errors are often 
caught by the writer when proofreading text. However, a 
similar interrupt occurs when typing text and the end of line 
is reached. The result can be that the same word is typed 
at the end of a line and the beginning of the next line. Such 
an error will very often not be caught when proofreading the 
text in its original form because the reader will experience 
a similar interrupt at the same point. Thp error will, 
however, become strikingly obvious when text is composed 
and the words apy.-ai next io each other in the middle of a 
line. This type of check is not foolproof as it will flag 
occurrences such as had had which is a valid word sequence 
though objected to by many editors and grammarians. 


58 











xes and 
v be two 
•ntly be 
h node 
icdes in 


as the 
number 
red and 


y whirh 
nt or a 
greatly 
because 
for. In 
supply 
master 
ie form 
nto the 
i cache 
e. 


naries, 
nat are 
ad into 
This 
easily 
lecker. 


' often 
'ed 
the 
nen be 
'orm a 


?s that 
btedlv 
needs. 


Tecker 
. more 
ore in 

,.,U . 


ror oi 
mally 
;d the 
often 
ver. a 
>f line 
typed 
Such 
ig the 
rience 
will, 
posed 
e of a 
1 flag 
vence 
rians. 


Another check that can be done on word pairs is for the 
occurrence of word pairs such as can not that are normally 
spelled as a single word. This type of checking, however, will 
require more than a transient memory which is all that is 
needed for identical word pair matching. 

There are many other checks similar to word pair checking 
that can be done, though most will not have as great a 
return. Another such check which the author has been 
considering, though has not yet implemented, is the checking 
of foreign phrases which appear frequently in English text. 
For example, phrases such as a la carte . deja vu. per se. a la 
mode. etc. Many of the words in such phrases will 
inadvertently be flagged although they are perfectly 
appropriate within the phrase. Such checking could be done 
by appropriately expanding the transient memory and 
having a phrase dictionary that could be stepped through as 
words are checked. Such a phrase dictionary could easily be 
constructed as a tree structure incorporating the 
characteristics of both the word and character tree 
structures we have already seen (i.e., each word in a phrase 
would be the equivalent of a character). A similar type of 
checking could also be employed to check for overused or 
redundant phrases such as at that point in time. 

10. User Interface 

One aspect of a spelling checker which has not yet been 
addressed is the user interface. Spelling checkers can, and 
have, taken on different forms. The main difference is 
whether or not a checker is used on-line in an interactive 
manner. In some part, the form of a spelling checker will 
also be dependent on the facilities available to the user and 
the mode in which text updating and entry is done. If the 
process of checking is done separately from text updating and 
entry, it is sufficient to produce a cross reference listing of 
where possible misspellings occur in a document. This is 
normally the case when large documents are worked with, or 
where service personnel are responsible for doing text 
changes. It is also desirable, however, when the checking is 
done off-line as part of proofreading. If text entry and 
updating is done on-line, such as might be done by a 
secretary, it is normally desirable to have an interactive 
spelling checker that can step through a document until a 
questionable word is found. After this has been done, the 
spelling checker can interact with the user to determine how 
to resolve the word. Some common facilities that can be 
provided in an interactive checker include: 

■ Editing capability to change misspelled words. This can 
be for a single instance of a word or for all instances in 
a document. 

■ Alternative correct spelling generation. Such a 
capability normally consists of applying a set of 
permutations on the misspelled word. The resultant 
words are then looked up to see if they are found in the 
dictionary and any correct words are displayed to the 
user. The rules normally consist of inverting letter 
pairs (a frequent typographical error), inserting 
characters, deleting characters, and replacing 
characters (especially vowels). (References [PI] and [P2] 
go into further detail on this.) Another approach that 
can be used is to have a dictionary of commonly 
misspelled words and their correct spellings. Such a 
dictionary could be obtained by monitoring the spelling 
correction process over a period of time. 

■ Ability to remember a correct word not found in the 
dictionary. This memory can be for a single session or 


a more permanent memory for future sessions. (A 
permanent memory can also be provided by means of a 
private dictionary.) 

■ On-line dictionary lookup capability. This is simply the 
capability to browse through the dictionary in the hope 
of running across the correct spelling for a word. 

A good interactive spelling checker can be a very useful tool, 
however, the option of not doing interactive checking should 
always be available. The reason for this is that interactive 
checking can be very time consuming if a large document is 
being checked or if a small document that contains a large 
number of unique abbreviations, or special terms not found 
in the dictionary, is being checked. The user interface 
chosen will, of course, depend on the facilities available and 
the needs of the users. 

11. One Implementation 

So far, little has been explicitly stated about the actual form 
and content of the spelling checker built by the author. This 
has been deliberately delayed until now to avoid prejudicing 
the reader. The form of a spelling checker (much like the 
content of its dictionary) should depend on its intended use 
and the facilities available. Before actually pinning down 
the details, it is worthwhile to look at the environment that 
existed when the spelling checker was built. The main points 
are summarized as follows: 

■ Computer memory, both primary and secondary, was not 
a major factor as document preparation and checking 
was done on a large computer. 

■ Document composition was normally done on a dedicated 
processor as a separate step. 

■ Text entry and updating was normally done on-line, 
often by service personnel, and usually with the aid of 
a marked up hard copy of the document (i.e., writing and 
proofreading were normally done off-line). 

■ The documents to be processed were normally in the 
range of twenty to five hundred pages, although smaller 
documents were sometimes encountered. 

As is frequently the case in programming, the intended use 
for which a program is designed, has little bearing on the 
way people actually use it. Such is the case with the spelling 
checker built by the author, which is now often used for 
checking one page memos and small papers in what is 
basically a totally on-line mode. This, however, is another 
issue. Getting back to the point at hand, the spelling checker 
that was actually built has the following main 
characteristics, which are further described in the user guide 
for the spelling checker [SI]. 


11.1. Word Recognition 

The spelling checker was built with five lexical analyzers, 
three for different document processors and two for different 
forms of plain text. Each analyzer could dynamically react 
to run-time parameters so that words could be broken (or 
remain joined) at hyphens, apostrophes, and accent marks. 
Words that corresponded to variable names, as found in most 
programming languages, could De excluded from checking by 
another option. Case distinctions for characters in a word 
were preserved so that errors in the capitalization of proper 
names, such as trademarks, could be caught. Words were 


59 



categorized into four categories: all loweicase. all uppercase, 
leading capital letter, and mixed case (e.g.. McCarthyi. Word 
matching was done according to the following rules: 

■ A lowercase word in a dictionary could match a werd of 
any of the four categories le g., any could match itself 
or ANY. Any. or AnY ). 

■ An uppercase word in a dictionary could only match its 
uppercase form in text le g.. EBCDIC could' not match 
ebcdic. Ebcdic. or ebcDIC I. 

■ A leading capital word in a dictionary could only match 
itself or its uppercase form in text (e g.. John could 
match John or JOHN but not john or JOhn I. 

= A mixed case word ir. o dictionary, like a leading capital 
word, could only match itself or its uppercase form in 
text. 

This preservation of case was done, as part of an underlying 
design philosophy, so that the highest degree of checking 
possible could be achieved. 


1 1-2. Word Checking 

The use of prefix and suffix stripping was not employed in 
the spelling checker, again as a matter of design philosophy 
(although it was used in a separate processor as an aid in 
word verification). It was decided that reiiance would be 
placed on a large dictionary that would contain all forms of 
the words to be checked even though it was known that such 
a dictionary would have to be built in an evolutionary 
manner. This decision prompted, to some extent, the 
inclusion of a private dictionary facility in the spelling 
checker because it could be used to augment the built-in 
dictionaries. The built-in dictionaries included a 
high-frequency dictionary, a master dictionarv on secondary 
storage accessed via a cache, and an auxiliary dictionary 
(normally consisting of unique termsl that could be built into 
the spelling checker and enabled or disabled for each 
document checked. 

The cache memory, which was implemented as a hash table, 
was used for many purposes. It not only contained words 
brought in from the master dictionary, but also contained 
words from the private dictionary (if used!, and words which 
were not found in any dictionary (i.e., the list of possible 
misspellings). After the complete document had been 
checked, the list of possible errors was removed from the 
cache and reordered as a tree structure with words sorted in 
dictionary order (which is different from collating sequence 
order when uppercase and lowercase characters are distinct). 
After this was done, a cross reference listing was generated 
indicating the line number in the document where each 
questionable word appeared. In addition to, or in place of, 
producing the cross reference listing, a symbolic text 
containing the questionable words could be produced. This 
was normally used to capture words for a private dictionary. 


11.3. The Future 

As indicated earlier, the basic premise of use. on which the 
spelling checker was designed, has changed. The result of 
this has been a reevaluation of the form that the spelling 
checker should take. Basically, this has resulted in a 
decision that some aspects of the spelling checker will be 
changed to make it more versatile in the interactive 
environment in which it has come to be used. Its basic 


structure, however, will remain unchanged, although the 
user interface will be extended to provide some of the 
interactive capabilities indicated earlier. Other changes will 
undoubtedly be made later as the program evolves with use 
and as it becomes clearer what is needed for the particular 
environment in which it is being used. Implementors of 
other spelling checkers will undoubtedly encounter a similar 
evolution. 


References 

[Bll Bloom, B H. "Space/Time Trade-offs in Hash Coding 
with Allowable Errors" CACM Vol. 13 No 7 (June 
1970), 422-426 

[Cl] Carroll. J. R. Davies. F. and Rirhman. R Th e 
American Heritage Word Frequency Book. American 
Heritage Publishing Company 1971 

[C2] Carter. L., Floyd. R., Gill, J., Markowsky. G. and 
Wegman, M. Exact and Approximate Membership 
Testers" Proceedings of the 10th ACM Symposium on 
the Theory of Computing, 1978 59-65 

[Gl] Gorin, R. E. SPELL: Spelling Check and Correction 
Program", Documentation for Stanford Spelling 
Checker, Private Communication, February 1980 

[Kl] Knuth, D. E. "Sorting and Searching" The Art of 
Computer Programming. Voi. 3. Addison- Wesley 
Publishing Company 1973 

[K2] KuKera, H. and Francis, W. N. Computational 
Analysis of Present-Day American English. Brown 
University Press 1967 

[Ml] Morris. R. and Cherry. L. L. ‘Computer Detection of 
Typographical Errors IEEE Transactions on Profes- 
sional Communication, Vol. PC-18, No 1 (March 
1975) 54-64 

[PI] Peterson, J. L. "Computer Programs for Spelling 
Correction" Lecture Notes in Computer Science Vol. 
96, Springer-Verlag 1980 

[P2] Peterson. J. L. "Computer Programs for Detecting and 
Correcting Spelling Errors" CACM Vol. 23 No. 12 
(December 1980) 676-687 

[SI] Sperry Univac Series 1100 Spelling Checker Level 
1R1, User Guide. UP-8979 1980 

[Tl] Turba, T. N. "General Syntax Analyzer IGSA)" ACM 
SIGPLAN Notices Vol. 14 No. 12 (Dec. 1979) 92-109 

[Ul] UNIX Spelling Checker (SPELL), Private Communi- 
cations with Bell Laboratories Personnel 1980 

[Wl] Wang, C. H. C.. Mitche). P. C.. Rugh. J. S. and 
Basheer. B. W. "A Statistical Method for Detecting 
Spelling Errors in Large Data Bases" Digest of Papers 
for Spring COMPCON 77 (Februarv-March 1977) 
124-128 

[Zl] Zamora. A. "Control of Spelling Errors in Large Data 
Bases" The Information Age in Perspective - 
Proceedings of the ASIS Annual Meeting Vol 15 1978 
364-367 


60 


ugh the 
'f the 
; will 
~ith use 

i rt icu lar 
ntors of 
i similar 


Computer Aids for Writers 

Lorinda Cherry 

Bell Laboratories 
Murray Hill, New Jersey 07974 


5 Coding 
7 (June 

R 77»p 
merican 


G. and 
ibership 
sium on 


rrection 

Spelling 

1980 

1 Art of 
-Wesley 


ABSTRACT 

For many people, writing is painful and editing one’s own prose is difficult, tedious, and error- 
prone. It is often hard to see which parts of a document are difficult to read or how to transform a 
wordy sentence into a more concise one. It is even harder to discover that one overuses a particular 
linguistic construct. The system of programs described here helps writers to evaluate documents and to 
produce better written and more readable prose. The system consists of programs to measure surface 
features of text that are important to good writing style as well as programs to do some of the tedious 
jobs of a copy editor. Some of the surface features measured are readability, sentence and word length, 
sentence type, word usage, and sentence openers. The copy editing programs find spelling errors, wordy 
phrases, bad diction, some punctuation errors, double words, and split infinitives. 


March 10, 1981 


■tational 

Brown 


ction of 
° r ofes- 
arch 


Spelling 
ice Vol. 


ting and 
No. 12 


r Level 


U* ACM 
> 92-109 

-mmuni- 

30 

S. and 
■electing 
f Papers 
h 1977) 


-ge Data 
ctive - 
15 1978. 


61 



Computer Aids for Writers 

Lorinda Cherry 

Bell Laboratories 
Murray Hill, New Jersey 07974 


1. Introduction 

Computer programs to format documents are now 
widely used. We have programs to typeset our docu- 
ments, even those with complicated mathematics, tables 
and line drawings. However, other than spelling checkers, 
little has been done to help writers produce better docu- 
ments. The system of writing or editing tools described 
here is a first step toward such help. The system includes 
programs and a data base to analyze writing style at the 
word and sentence level as well as programs to do some of 
the mechanical work of a copy editor. I use the term 
style in this paper to describe a writer’s particular 
choices among individual words and sentence forms. 
Although many judgements of style are subjective, partic- 
ularly those of word choice, there are some objective 
measures that experts agree lead to good style. A docu- 
ment that conforms to the stylistic rules is not guaranteed 
to be coherent and readable, but one that violates all of 
the rules is likely to be difficult or tedious to read. All 
the programs described in this paper run under the UNIXt 
Operating System; most have been incorporated into the 
system called 'The Writer’s Workbench" [1] by N. H. 
Macdonald, L. T. Frase, and S. A. Keenan. 

Section 2 briefly describes PARTS, a system for 
finding parts of speech in English text, which is the foun- 
dation for much of the other work. Section 3 describes 
the programs that use the part of speech output. Section 4 
discusses the copy editing programs. 

2. PARTS 

The PARTS system[2] is a set of programs to assign 
word classes to English text. It has been used as the basis 
of several programs to analyze English text and to locate 
particular linguistic constructs. PARTS consists of three 
programs. The first program does a variety of preprocess- 
ing functions and looks up each word in a small 

Permission to copy without fee all or part of this material is granted 
provided that the copies are not made or distributed for direct com- 
mercial advantage, the ACM copyright notice and the title of the 
publication and its date appear, and notice is given that copying is 
by permission of the Association for Computing Machinery. To 
copy otherwise, or to republish, requires a fee and/or specific permis- 
sion. 

Author's present address: Room 2C-516 Bel] Laboratories, Murrav 
Hill, New Jersey 07974 
t UNIX is a Trademark of Bell Laboratories. 

® 1981 ACM 0-89791-043-5/81/0600/0062 SOO.75 


dictionary. The second program checks for certain suf- 
fixes. The third program does the work of assigning word 
classes to words that the first two programs did not handle 
or, sometimes, correcting wrong word classes assigned by 
the first two programs. Corrections may be needed 
because the program assigns word classes to tokens (i.e., 
words as they are used) not types (i.e., dictionary entries). 

2.1. Dictionary and Preprocessing phase 

The first program identifies words and sentences in 
the text. A word is a string of characters separated by 
either punctuation or blanks. Certain abbreviations, like 
Mr., Ms., Dr., etc., are identified and treated as words. 
Numbers in various forms (10,000, 1st, 2’s, 3.) are also 
defined as words. Words that are capitalized in the mid- 
dle of sentences are treated as proper nouns. The pro- 
gram tries to distinguish possessives from the abbreviated 
form of “is”. Hyphenated words are treated as one word. 
Words that have not been identified as any of the above 
are looked up in a dictionary of 210 function words and 
140 irregular verbs. 

A sentence is defined as a string of words of length 
greater than 1 ending in one of the following punctuation 
marks: 

.?! 

PARTS uses heuristics to distinguish instances of the char- 
acter that are not sentence end markers. The following 
symbols are treated as commas 
.;:-() 

2.2. Suffix Phase 

The second program checks all words not assigned a 
word class by the first program for certain suffixes. 
PARTS uses 51 suffixes, some of which denote a unique 
word class; others denote a partial or dual word class. 
Any word that is not assigned a word class by either the 
first or second program is passed on to the third program 
as unknown. 

2.3. Assigning Word Classes 

The third program does most of the work. It first 
reads a whole sentence with partial word class assign- 
ments. It then calls a scanning routine for each dependent 
or independent clause in the sentence. If a sentence 
begins with a subordinate conjunction, scan assigns word 
classes until it finds the comma separating the dependent 


62 


tain suf- 
mg word 
>t handle 
igned by 
needed 
-'ns (i.e., 
entries). 


ences in 
rated by 
ins, like 
s words, 
are also 
he mid- 
he pro- 
reviated 
e word, 
e above 
rds and 


programming environment 
readability grades: 

(Kincaid) 12.3 (auto) 12.8 (Coleman-Liau) 11.8 (Flesch) 13.5 (46.3) 

sentence info: 

no. sent 335 no. wds 7419 

av sent leng 22.1 av word leng 4.91 

no. questions 0 no. imperatives 0 

no. nonfunc wds 4362 58.8% av leng 6.38 

short sent (<17) 35% (118) long sent (>32) 16% (55) 

longest sent 82 wds at sent 174; shortest sent 1 wds at sent 117 

sentence types: 

simple 34% (114) complex 32% (108) 
compound 12% (41) compound-complex 21% (72) 

word usage: 

verb types as % of total verbs 
tobe 45% (373) aux 16% (133) inf 14% (114) 
passives as % of non-inf verbs 20% (144) 
types as % of total 

prep 10.8% (804) conj 3.5% (262) adv 4.8% (354) 
noun 26.7% (1983) adj 18.7% (1388) pron 5.3% (393) 
nominalizations 2 % (155) 

sentence beginnings: 

subject opener: noun (63) pron (43) pos (0) adj (58) art (62) tot 67% 
prep 12% (39) adv 9% (31) 
verb 0% (1) sub_conj 6% (20) conj 1% (5) 
expletives 4% (13) 


'ngth 

-ion 


Figure 1 
STYLE Output 


ie char- 
'llowing 


gned a 
affixes, 
unique 
1 class, 
her the 
rogram 


It first 
assign- 
endent 
ntence 
i word 
endent 


and independent clauses. In the sentence 

If too many germs are found, the milk is not safe 
to drink. 

scan is called first with 

If too many germs are found, 
and then with 

the milk is not safe to drink. 

scan first tries to find the verb in the clause, if it is 
obvious, by looking for auxiliary verbs or words assigned 
class verb. After noting whether it has the verb, it then 
starts at the beginning of the clause and looks for a noun, 
usually the subject of the clause. If no verb was found, 
PARTS uses the rule that subject and verb must agree in 
number to locate the verb in the clause. Finally it 
traverses the sentence word by word assigning words 
classes using rules of English word order. This is a gen- 
eral outline of how PARTS works; the details are 
described in [2]. 

PARTS has been tested and found to be about 95% 
accurate on both graded and technical text. It works at a 
rate of about 340 words per second on a PDP11/70 run- 
ning the UNIX Operating System. Because PARTS uses 
only a small dictionary and general rules on English word 
order, it works on text about any subject and has been 


used in a variety of ways to look at English text. 

3. Programs that use PARTS 
3.1. STYLE 

The program STYLE[3] reads a document and prints 
a summary of readability indices, sentence length and 
type, word usage, and sentence openers. It may also be 
used to locate all sentences in a document longer than a 
given length, of readability index higher than a given 
number, those containing a passive verb, or those begin- 
ning with an expletive. Style measures have been built 
into the output phase of the programs that make up 
PARTS. Some of the measures are simple counters of the 
word classes found by PARTS; many are more compli- 
cated. For example, the verb count is the total number of 
verb phrases. This includes phrases like: 

has been going 
was only going 
to go 

each of which counts as one verb. Figure 1 shows the 
output of STYLE run on a technical paper about the 
UNIX programming environment [4], The following is a 
discussion of the five parts of the STYLE output. 


63 




3.1.1. Readability Grades 

The first section of STYLE output consists of four 
readability indices. As Klare points out in [3] readability 
indices may be used to estimate the reading skills needed 
by the reader to understand a document. The readability 
indices reported by STYLE are based on measures of sen- 
tence and word lengths. Although the indices may not 
measure whether the document is coherent and well 
organized, experience has shown that high indices indicate 
stylistic difficulty. Documents with short sentences and 
short words have low scores; those with long sentences and 
many polysyllabic words have high scores. The 4 formu- 
lae reported are the Kincaid Formula [5], the Automated 
Readability Index [6], the Coleman-Liau Formula [7] and 
a normalized version of the Flesch Reading Ease Score 
[8]. The formulae differ because they were experimen- 
tally derived using different texts and subject groups. 
Coke [9] found that the Kincaid Formula is probably the 
best predictor for technical documents. 

If a document has particularly difficult technical con- 
tent, especially if it includes a lot of mathematics, it is 
probably best to make the text very easy to read, i.e., to 
lower the readability index by shortening the sentences 
and words. This will allow the reader to concentrate on 
the technical content and not the long sentences. This is 
especially true with highly mathematical text, since the 
mathematics is not counted in calculating the readability 
scores. The writer should remember that these indices are 
estimators, they should not be taken as absolute numbers. 

3.1.2. Sentence length and structure 

The next two sections of STYLE output deal with 
sentence length and structure. Almost all books on writ- 
ing style or effective writing emphasize the importance of 
variety in sentence length and structure for good writing. 
Although experts agree that these rules are important, not 
all writers follow them. 

The output sections labeled “sentence info” and 
sentence types” give both length and structure measures. 
STYLE reports on the number and average length of both 
sentences and words, and number of questions and 
imperative sentences. The measures of non-function 
words are an attempt to look at the content words in the 
document. In English non-function words are nouns, 
adjectives, adverbs, and non-auxiliary verbs; function 
words are prepositions, conjunctions, articles, and auxili- 
ary verbs. Since most function words are short, they tend 
to lower the average word length. The average length of 
non-function words may be a more useful measure for 
comparing word choice of different writers than the total 
average word length. The percentages of short and long 
sentences measure sentence length variability. Short sen- 
tences are those at least 5 words less than the average- 
long sentences are those at least 10 words longer than the 
average. Last in the sentence information section are the 
length and location of the longest and shortest sentences. 

Because of the difficulties in dealing with the many 
uses of commas and conjunctions in English, sentence type 


definitions vary slightly from those of standard textbooks, 
but still measure the same constructional activity. 

1. A simple sentence has one verb and no dependent 
clause. 

2. A complex sentence has one independent clause and 
one dependent clause, each with one verb. Complex 
sentences are found by identifying sentences that 
contain either a subordinate conjunction or a clause 
beginning with words like “that” or “who”. The 
preceding sentence has such a clause. 

3. A compound sentence has more than one verb and 
no dependent clause. Sentences joined by are 
also counted as compound. 

4. A compound-complex sentence has either several 
dependent clauses or one dependent clause and a 
compound verb in either the dependent or indepen- 
dent clause. 

Even using these broader definitions, simple sen- 
tences dominate many technical documents, but the exam- 
ple in Figure 1 shows variety in both sentence structure 
and sentence length. 

3.1.3. Word Usage 

The word usage measures are an attempt to identify 
some other constructional features of writing style. There 
are many different ways in English to say the same thing. 
The following sentences all convey approximately the same 
meaning but differ in word usage: 

The cxio program is used to perform all com- 
munication between the systems. 

The cxio program performs all communications 
between the systems. 

The cxio program is used to communicate 
between the systems. 

The cxio program communicates between the 
systems. 

All communication between the systems is per- 
formed by the cxio program. 

The distribution of the parts of speech and verb types 
helps identify overuse of particular constructions 
Although the measures used by STYLE are crude, they do 
point out problem areas. For each category, STYLE 
reports a percentage and a raw count. In addition to 
looking at the percentage, a writer may find it useful to 
compare the raw count with the number of sentences. If, 
for example, the infinitive count is almost equal to the 
sentence count, then many of the sentences in the docu- 
ment are constructed like the first and third in the preced- 
ing example. The writer may want to transform some of 
these sentences into another form. Some of the implica- 
tions of the word usage measures are discussed below. 


64 



books, 

dependent 

lause and 
Complex 
nces that 
' a clause 
o”. The 

verb and 
are 

r several 
e and a 
indepen- 


>ple sen- 
he exam- 
structure 


■ identify 
There 
ie thing, 
he same 


'om- 

lications 

nunicate 

een the 


is per- 


"b types 
uctions. 
they do 
STYLE 
ition to 
seful to 
:es. If, 
to the 
3 docu- 
preced- 
ome of 
mplica- 
w. 


Verbs are measured in several different ways to try to 
determine what types of verb constructions are most 
frequent in the document. Technical writing tends 
to contain many passive verb constructions and other 
usage of the verb “to be”. The category of verbs 
labeled “tobe” measures both passives and sentences 
of the form: 

subject tobe predicate 

In counting verbs, whole verb phrases are counted as 
one verb. Verb phrases containing auxiliary verbs 
are counted in the category “aux”. The verb 
phrases counted here are those whose tense is not 
simple present or simple past. Infinitives are listed 
as “inf”. The percentages reported for these three 
categories are based on the total number of verb 
phrases found. Use of these verb constructions 
varies significantly among authors. 

STYLE reports passive verbs as a percentage of the 
finite verbs in the document. Most style books warn 
against the overuse of passive verbs. 

Pronouns add cohesiveness and connectivity to a document 
by providing back-reference. They are often a 
short-hand notation for something previously men- 
tioned, and therefore connect the sentence contain- 
ing the pronoun with the word to which the pronoun 
refers. Although there are other mechanisms for 
such connections, documents with no pronouns tend 
to be wordy and disconnected. 

Adverbs can provide transition between sentences and 
order in time and space. In performing these func- 
tions, adverbs, like pronouns, provide connectivity 
and cohesiveness. 

Conjunctions provide parallelism in a document by con- 
necting two or more equal units. These units may 
be whole sentences, verb phrases, nouns, adjectives, 
or prepositional phrases. The compound and 
compound-complex sentences reported under sen- 
tence type are parallel structures. The difference 
between the number of conjunctions reported under 
word usage and the sum of the compound and 
compound-complex sentences is a measure of other 
parallel structures in the document. 

Nouns and Adjectives. A ratio of nouns to adjectives near 
unity reflects the over-use of modifiers. Some techn- 
ical writers qualify every noun with one or more 
adjectives. Qualifiers in phrases like “simple linear 
single-link network model” often lend more obscu- 
rity than precision to a text. 

Nominalizations have been verbs that are changed to nouns 
by adding one of the suffixes “ment”, “ance”, 
“ence”, or “ion”. Examples are accomplishment, 
admittance, adherence, and abbreviation. When a 
writer transforms a nominalized sentence to a non- 
nominalized sentence, she/he increases the effective- 
ness of the sentence in several ways. The noun 
becomes an active verb and frequently one 


complicated clause becomes two shorter clauses. For 
example. 

Their inclusion of this provision is admission of 
the importance of the system. 

When they included this provision, they admitted 
the importance of the system. 

Writers who find their documents contain many 
nominalizations may want to transform some of the 
sentences to use active verbs. 

3.1.4. Sentence openers 

Another agreed upon principle of style is variety in 
sentence openers. Because STYLE determines the type of 
sentence opener by looking at the part of speech of the 
first word in the sentence, the sentences counted under the 
heading “subject opener” may not all really begin with the 
subject. However, a large percentage of sentences in this 
category still shows lack of variety in sentence openers. 
Other sentence opener measures help determine if there 
are transitions between sentences and where the subordi- 
nation occurs. Adverbs and conjunctions at the beginning 
of sentences are mechanisms for transition between sen- 
tences. A pronoun at the beginning shows a link to some- 
thing previously mentioned and indicates connectivity. 

The last category of openers, expletives, is com- 
monly overworked in technical writing. Expletives are the 
words “it” and “there”, usually with the verb “to be”, in 
constructions where the subject follows the verb. For 
example, 

There are three streets used by the traffic. 

There are too many users on this system. 

3.2. PROSE 

The PROSE program uses some of the statistics col- 
lected by STYLE to compare a document to standards cal- 
culated from other documents of the same type. The out- 
put of PROSE is a textual description of the input docu- 
ment describing how the document compares to good 
documents in the areas of readability, sentence structure 
and variation, use of passive verbs, nominalizations and 
expletives. In its output PROSE gives the writer sugges- 
tions for improving the document. Standards have been 
calculated for technical papers and training documents. 

3.3. REWRITE 

The REWRITE program is based on R. A. 
Lanham’s theory[10] of revising prose. Lanham’s theory 
in part is that to rewrite a text you should locate all prepo- 
sitions, forms of the verb "to be", and empty phrases. 
With these words emphasized you will find the sentences 
in the text with passive verbs, many extra words, and long 
strings of prepositional phrases that are candidates to be 
rewritten. REWRITE is a mechanized version of this 
theory intended to be used on the first draft of a 


65 


document. It capitalizes the empty phrases found by DIC- 
TION (discussed below), and prepositions and forms of 
"to be" found by PARTS. The output is the document 
with the bad sentences visually obvious by virtue of the 
capitalization. 

3.4. TOPIC 

The TOPIC program uses PARTS to locate frequent 
noun phrases in a document. The most frequent noun 
phrases usually are the topic or topics of the document 
and may be used as key words or indexes in the docu- 


3.5. SPLITINF 

Another use of the PARTS output is the program 
SPLITINF, which finds split infinitives in the document. 

4. Copy Editing Programs 

4.1. DICTION and SUGGEST 

The program DICTION prints all sentences in a 
document containing phrases that are either frequently 
misused or suggest wordiness. The program, an extension 
of Aho’s FGREP [11] string matching program, takes as 
input a file of phrases or patterns to be matched and a file 
of text to be searched. A data base of about 450 phrases 
has been compiled as a default pattern file for DICTION. 
DICTION brackets all pattern matches in a sentence with 
the characters “*[” “*]” . Although many of the phrases 
in the default data base are correct in some contexts, in 
others they indicate wordiness. Sometimes a simple sub- 
stitution improves the sentence, but often the presence of 
the phrase suggests muddled thinking and the whole sen- 
tence should be rewritten. Some examples of the phrases 
and suggested alternatives are: 


Phrase 

a large number of 
arrive at a decision 
collect together 
for this reason 
pertaining to 
through the use of 
utilize 

with the exception of 


Alternative 

many 

decide 

collect 

so 

about 
by or with 


Some of the entries are short forms of problem phrases. 
For example, the phrase “the fact” is found in all of the 
following and is sufficient to point out the wordiness to 
the writer: 


Phrase Alternative 

accounted for by the fact that caused by 

an example of this is the fact that thus 

based on the fact that because 

despite the fact that although 

due to the fact that because 

in light of the fact that because 

in view of the fact that since 

notwithstanding the fact that although 

Writers may supply their own pattern files to be used 
either in addition to the default file or alone. This 
mechanism allows users to suppress patterns contained in 
the default file or to include their own pet peeves that are 
not in the default file. Pattern files of commonly confused 
words and sexist terms are currently being compiled. 

SUGGEST is an interactive thesaurus for phrases 
found by DICTION. The user types one of the phrases 
bracketed by DICTION and SUGGEST responds with 
suggested substitutions for the phrase that will improve the 
diction of the document. 

4.2. SPELL, DOUBLE, and PUNCT 

In addition to the SPLITINF program, the system 
has several programs that do copy editing. The UNIX 
SPELL program looks up each word of the document in a 
dictionary and reports words not found in the dictionary 
as possible misspellings. Writers may have her/his own list 
of acronyms and proper names to be excluded from the 
SPELL output. Another program called DOUBLE finds 
all occurrences of the same word twice in succession. This 
is a common error in computer-edited text and is often 
overlooked in proofreading. 

The program called PUNCT checks the document 
for unbalanced quotes or parentheses and for errors in cer- 
tain kinds of punctuation. The punctuation rules it 
enforces are strict standards like the placement of punctua- 
tion with respect to double quotes and the removal of 
spaces around dashes. 

5. Conclusions 

The system described here is in wide use at Bell 
Laboratories by both professional and nonprofessional 
writers. Writers report that they find the programs useful 
and feel that their writing has improved as a result of 
using the system. A formal study of the system’s effects 
on writers’ text is planned. 

6. Acknowledgements 

W. Vesterman consulted in the development of the 
STYLE and DICTION programs. M. D. MacDroy wrote 
the SPELL program. N. H. Macdonald wrote the 
PROSE, SPLITINF, DOUBLE, and PUNCT programs. 


66 


7. References 


lative 
I by 


gh 

■e 

■e 

gh 

>e used 
This 
ned in 
lat are 
nfused 

h rases 
'hrases 
s with 
■ve the 


system 
UNIX 
nt in a 
ionary 
wn list 
" the 
ads 
rhis 
■ often 

-ument 
in cer- 
rles it 
:nctua- 
val of 


it Bell 
ssional 
useful 
suit of 
effects 


of the 
wrote 
te the 
ims. 


1. N. H. Macdonald, L. T. Frase, and S. A. Keenan, 
“Writer’s Workbench: Computer Programs for Text 
Editing and Assessment," Bell Laboratories, Piscata- 
way, New Jersey (1980). 

2. L. L. Cherry, “PARTS - A System for Assigning 
Word Classes to English Text,” Computing Science 
Technical Report #81, 1980, Bell Laboratories, Mur- 
ray Hill, NJ 07974. 

3. L. L. Cherry and W. Vesterman, “Writing Tools - 
The STYLE and DICTION Programs,” Bell Labora- 
tories, Murray Hill, New Jersey (1980). 

4. B. W. Kemighan and J. R. Mashey, “The UNIX 
Programming Environment,” Software — Practice & 
Experience , 9, 1-15 (1979). 

5. E. A. Smith and P. Kincaid, “Derivation and vali- 
dation of the automated readability index for use 
with technical materials,” Human Factors, 1970, 12, 
457-464. 

6. J. P. Kincaid, R. P. Fishbume, R. L. Rogers, and 
B. S. Chissom, “Derivation of new readability for- 
mulas (Automated Readability Index, Fog count, 
and Flesch Reading Ease Formula) for Navy enlisted 
personnel,” Navy Training Command Research 
Branch Report 8-75, Feb., 1975. 

7. M. Coleman and T. L. Liau, “A Computer Reada- 
bility Formula Designed for Machine Scoring,” Jour- 
nal of Applied Psychology, 1975, 60, 283-284. 

8. R. Flesch, “A New Readability Yardstick,” Journal 
of Applied Psychology, 1948, 32, 221-233. 

9. E. U. Coke, private communication. 

10. Richard A. Lanham, Revising Prose, Charles 
Scribner’s Sons, New York, N. Y. (1979). 

11 A. V. Aho and M. J. Corasick, “Efficient String 
Matching: an aid to Bibliographic Search,” Commun- 
ications of the ACM, 18, (6), 333-340, June 1975. 


67 


A Generalized Approach to Document Markup 


Dr. C. F. Goldfarb 

IBM Corporation 
San Jose, California, U.S.A 


The Markup Process 

Text processing and word processing systems typically 
require users to intersperse additional information in the 
natural text of the document being processed. This added 
information, called “markup,” serves two purposes: 

1. it separates the logical elements of the document; and 

2. it specifies the processing functions to be performed 
on those elements. 

Consider how the user of such a system marks up a doc- 
ument. There are three distinct steps, although he may 
not perceive them as such. 

1. First, he analyzes the information structure and other 
attributes of the document; that is, he identifies each 
meaningful separate element, and characterizes it as a 
paragraph, heading, ordered list, footnote, or some 
other element type. 

2. He then determines, from memory or a style book, the 
processing instructions (“controls”) that will produce 
the format desired for that type of element. 

3. Finally, he inserts the chosen controls into the text. 

Here is how the start of this paper looks when marked up 
with controls in the Script formatting language (1): 


Permission to copy without fee all or part of this material is 
granted provided that the copies are not made or distributed 
for direct commercial advantage, the ACM copyright notice 
and the title of the publication and its date appear, and 
notice is given that copying is by permission of the Associa- 
tion for Computing Machinery. To copy otherwise, or to 
republish, requires a fee and/or specific permission. 

® 1981 ACM 0-89791-043-5/81/0600/0068 $00.75 


.sk 1 

Text processing and word 
processing systems typically 
require users to intersperse 
additional information in 
the natural text of the document 
being processed. 

This added information, called 
markup, ' serves two purposes: 

.tb 4 
. of 4 
.sk 1 

1. -’it separates the logical 
elements of the document; and 
.of 4 

.sk 1 

2. -'it specifies the processing 
functions to be 

performed on those elements. 

. of 0 
.sk 1 

The .SK, .TB, and .OF controls, respectively, cause the 
skipping of vertical space, the setting of a tab stop, and 
the offset, or “hanging indent,” style of formatting. (The 
not sign (-■) in each list item represents a tab code, which 
would otherwise not be visible.) 

Procedural markup like this has a number of disadvan- 
tages. For one thing, information about the document’s 
attributes is usually lost. If the user, for example, decides 
to center both headings and figure captions when format- 
ting, the “center” control will not indicate whether the 
text on which it operates is a heading or a caption. 
Therefore, if he wishes to use the document in an infor- 
mation retrieval application, search programs will be 
unable to distinguish headings — which might be very sig- 
nificant in information content — from the text of anything 
else that was centered. 

Procedural markup is also inflexible. If the user decides 
to change the style of his document (perhaps because he 
is using a different output device), he will need to repeat 
the markup process to reflect the changes. This will pre- 
vent him, for example, from producing double-spaced 
draft copies on an inexpensive computer line printer, while 
still obtaining a high quality finished copy on an expen- 


68 



;e the 
id 
-he 
which 


dvan- 
nent’s 
scides 
rmat- 
r the 
it ion. 
infor- 
II be 
y sig- 
thing 


■tides 
se he 
epeat 
pre- 
taced 
while 
tpen- 


sive photocomposer. And if he wishes to accept 
competitive bids for the typesetting of his document, he 
will be restricted to those vendors that use the identical 
text processing system, unless he is willing to pay the cost 
of repeating the markup process. 

Moreover, markup with control words can be 
time-consuming, error-prone, and require a high degree of 
operator training, particularly when complex typographic 
results are desired. This is true (albeit less so) even when 
a system allows defined control procedures (“macros”), 
since these must be added to the user’s vocabulary of 
primitive controls. The elegant and powerful TEX system 
(2), for example, which the American Mathematical Soci- 
ety is considering for standard use by its authors, includes 
some 300 primitive controls and macros in its basic 
implementation. 

These disadvantages of procedural markup are avoided by 
a markup schema due to C. F. Goldfarb, E. J. Mosher, 
and R. A. Lorie (3, 4). It is called the “Generalized 
Markup Language” (GML) because it does not restrict 
documents to a single application, formatting style, or 
processing system. GML is based on two novel postulates: 

1. Markup should describe a document’s structure and 
other attributes, rather than specify processing to be 
performed on it, as descriptive markup need be done 
only once and will suffice for all future processing. 

2. Markup should be rigorous, so the techniques avail- 
able for processing rigorously-defined objects like pro- 
grams and data bases can be used for processing 
documents as well. 

These postulates will be developed intuitively by examin- 
ing the properties of GML. 

Descriptive Markup 

With GML, the markup process stops at the first step: 
the user locates each significant element of the document 
and marks it with the mnemonic tag (“generic identifier”) 
that he feels best characterizes it. The processing system 
associates the markup with processing instructions in a 
manner that will be described shortly. 

Marked up in GML, the start of this paper looks like this: 


:p. 

Text processing and word 
processing systems typically 
require users to intersperse 
additional information in 
the natural text of the document 
being processed. 

This added information, called 

: q .markup :: q . , serves two purposes: 

sol. 

sli.it separates the logical 
elements of the document; and 
sli.it specifies the processing 
functions to be 
performed on those elements. 

: sol. 

Each generic identifier (GI) is delimited by a colon (:) if 
it is at the start of an element, or by two colons (::) if it is 
at the end. A period (.) separates a GI from any text 
that follows it. The mnemonics P, Q, OL, and LI stand, 
respectively, for the element types paragraph, quotation, 
ordered list, and list item. 

This example has some interesting properties. Note that 
there are no quotation marks in the text; the processing 
for the quotation element generates them, and will distin- 
guish between opening and closing quotes if the output 
device permits. The comma that follows the quotation 
element is not actually part of it, but is brought inside the 
quotation marks during formatting. Similarly, sequence 
numbers are generated for the ordered list items. The 
source text, in other words, contains only information; 
characters whose sole role is to enhance the presentation 
are generated during processing. 

If, as postulated, descriptive markup like this suffices for 
all processing, it must follow that the processing of a doc- 
ument is a function of its attributes. The way text is 
composed offers intuitive support for this premise. For 
example, techniques like beginning chapters on a new 
page, italicizing emphasized phrases, and indenting lists, 
are employed to assist the reader’s comprehension by 
emphasizing the structural attributes of the document and 
its elements. 

From this we can construct a 3-step model of document 
processing: 

1. Recognition: An attribute of the document is recog- 
nized; for example, an element with a generic identifi- 
er of “footnote.” 


2. Mapping: The attribute is associated with a process- 
ing function. The footnote GI, for example, could be 
associated with a procedure that prints footnotes at 
the bottom of the page, or one that collects them at 
the end of the chapter. 


69 




3. Processing: The chosen processing function is 
executed. 


Text formatting programs conform to this model. They 
recognize such elements as words and sentences, primarily 
by interpreting spaces and punctuation as implicit 
markup. Mapping is usually via a branch table. Process- 
ing for words typically involves determining the word’s 
width and testing for an overdrawn line, while processing 
for sentences might cause space to be inserted between 
them.' 

In the case of low-level elements such as words and sen- 
tences, the user is normally given little control over the 
processing, and almost none over the recognition. Some 
formatters offer more flexibility with respect to 
higher-level elements like paragraphs, while those with 
powerful macro languages can go as far as to support 
descriptive markup. In terms of the document processing 
model, the advantage of descriptive markup is that it 
permits the user to define attributes — and therefore ele- 
ment types — not known to the formatter, and to specify 
the processing for them. 

For example, the GML markup sample just described 
includes the element types “ordered list” and “list item,” 
in addition to the more common “paragraph.” Built-in 
recognition and processing of such elements is unlikely. 
Instead, each will be recognized by its explicit markup, 
and mapped to a control procedure associated with it for 
the particular processing run. Both the control procedure 
itself, and the association with a GI, would be expressed 
in the system’s macro language. On other processing 
runs, or at different times in the same run, the association 
could be changed. The list items, for example, might be 
numbered in the body of a book, but lettered in an 
appendix. 

So far the discussion has addressed only a single attribute, 
the generic identifier, whose value characterizes an ele- 
ment’s semantic role or purpose. Some descriptive mark- 
up schemes refer to markup as “generic coding,” because 
the GI is the only attribute they recognize (5). In generic 
coding schemes, recognition, mapping, and processing can 
be accomplished all at once by the simple device of using 
GIs as control procedure names. Different formats can 
then be obtained from the same markup by invoking a 
different set of homonymous procedures. This approach is 
effective enough that one notable implementation, the 


SCRIBE system, is able to prohibit procedural markup 
completely (9). 

Generic coding is a considerable improvement over proce- 
dural markup in practical use, but it is conceptually 
unsound. This is because documents are complex objects, 
and they have other attributes that a markup language 
must be capable of describing. For example, suppose the 
user decides that his document is to include elements of a 
type called “figure,” and that it must be possible to refer 
to individual figures by name. The markup for a partic- 
ular figure element known as “angelfig” could begin like 
this: 

:fig id=angelfig 

“Fig,” of course, stands for “figure,” and is the value of 
the generic identifier attribute. The GI identifies the 
element as a member of. a set of elements having the 
same role. In contrast, the “unique identifier” (ID) attri- 
bute distinguishes the element from all others, even those 
with the same GI. (It was unnecessary to say “GI=fig,” 
as was done for ID, because in GML it is understood that 
the first piece of markup for an element is the value of its 
GI). 

The GI and ID attributes are termed "primary” because 
every element can have them. There are also “secondary” 
attributes that are possessed only by certain element 
types. For example, if the user wanted some of the fig- 
ures in his document to contain illustrations to be pro- 
duced by an artist and added to the processed output, he 
could define an element type of “artwork.” Because the 
size of the externally-generated artwork would be impor- 
tant, he might define artwork elements to have a 
secondary attribute, “depth.” 2 This would result in the fol- 
lowing markup for a piece of artwork 24 picas deep: 

: artwork depth-24p 


The markup for a figure would also have to describe its 
content. “Content” is, of course, a primary attribute, and 
is the one that the secondary attributes of an element 
describe. The content consists of an arrangement of other 
elements, each of which in turn may have other elements 
in its content, and so on until further division is impossi- 
ble. 1 One way in which GML differs from generic coding 
schemes is in the conceptual and notational tools it pro- 
vides for dealing with this hierarchical structure. These 


The model need not be reflected in the program architecture; processing of words, for example, could be built into the main recognition 
loop to improve performance. 

“Depth-” is not simply the equivalent of a vertical space control word. Although a full-page composition program could produce the 
actual space, a galley formatter might print a message instructing the layout artist to leave it. A retrieval program might simply index 
the figure and ignore the depth entirely. 

This explains why we can speak of documents and elements almost interchangeably: the document is simply the element that is at the 
top of the hierarchy for a given processing run. A technical report, for example, could be formatted both as a document its own right, 
or as an element of a journal. 


70 



kup 


proce- 
ptually 
>bjects, 
nguage 
we the 
ts of a 
o refer 
partic- 
in like 


ilue of 
es the 
tg the 
) attri- 
i those 

I-fig,” 

>d that 
■ of its 


-■cause 
tdary” 
ement 
ie fig- 
t pro- 
he 
he 
...por- 
ive a 
ae fol- 


be its 
and 
iment 
other 
ments 
ipossi- 
toding 
t pro- 
These 


are based on the second GML hypothesis, that markup 
can be rigorous. 

Rigorous Markup 

Assume that the content of the figure “angelfig” consists 
of two explicitly marked up elements (“explicit 
elements”), a figure body and a figure caption. The fig- 
ure body in turn contains an artwork element, while the 
content of the caption is text with no explicit markup (an 
“implicit element”). The markup for this figure could 
look like this: 1 

:fig id=angelfig 
: f igbody 

:artwork depth=24p 
: : artwork 
: : f igbody 

: f igcap . Three Angels Dancing 
: : f igcap 
: : f ig 

The markup rigorously expresses the hierarchy by identi- 
fying the beginning and end of each element in classical 
left list order. No additional information is needed to 
interpret the structure, and it would be possible to imple- 
ment support by the simple scheme of macro invocation 
discussed earlier. The price the user pays for this simplic- 
ity, though, is that he must enter explicit markup for the 
end of every explicit element. 

This price is totally unacceptable in practice. The user 
knows that the start of a paragraph, for example, termi- 
nates the previous one, so he will be reluctant to go to the 
trouble and expense of entering an explicit ending tag for 
every single paragraph just to share his knowledge with 
the system. He will have equally strong feelings about 
other element types he may define for himself, if they 
occur with any great frequency. 

With GML markup, though, it is possible to advise the 
system about the attributes of any type of element the 
user creates. This is done by creating a formal definition, 
or “model,” using a notation derived from the BNF nota- 
tion used for formal grammars. While the markup in a 
document consists of descriptions of individual elements, a 
GML model defines the set of all possible valid 
descriptions of a type of element. 


As an example, suppose the user extends his definition of 
“figure ” to permit the figure body to contain certain 
kinds of textual elements instead of artwork. The GML 
model might look like this: 5 

1. : f ig id-? Sdef. f igbody, f igcap? 

2. :figbody sdef. artwork / (p / ol / ul ) * 

3. : artwork depth- Sdef. EXTERNAL 

4. : f igcap Sdef. word, (PUNCTUATION, word)*? 

5. word Sdef. CHARACTER* 


Line 1 means that a figure may optionally have the ID 
attribute specified, and that it can contain a figure body 
and, optionally, a figure caption following the figure body. 
Line 2 says the body may contain either artwork or an 
intermixed collection of paragraphs, ordered lists, and 
unordered lists. Line 3 defines artwork as requiring spec- 
ification of the depth attribute, and having an externally 
generated content. Line 4 defines a figure caption's con- 
tent as 1 or more words, separated by punctuation. Line 
5 defines a word as containing 1 or more characters. 
EXTERNAL, PUNCTUATION, and CHARACTER are 
treated as terminals, incapable of further division. (It is 
assumed that P, OL, and UL have been defined 
elsewhere.) 6 

With this formal definition of figure elements available, 
the following markup for “angelfig” is now acceptable: 

:fig id-angelfig 
: f igbody 

: artwork depth-24p 
: f igcap . Three Angels Dancing • 

: :fig 


There has been a 40% reduction in markup, since three 
ending generic identifiers are no longer needed. As the 
model defines the figure caption as part of the contents of 
a figure, terminating the figure automatically terminated 
the caption. Since the figure caption itself is on the same 
level as the figure body, the :figcap tag implicitly termi- 
nated the figure body, and therefore the artwork element 
contained in it. 

GML models have uses in addition to markup minimiza- 
tion. They can be used to validate the markup in a doc- 
ument before going to the expense of processing it, or to 


gnition 

ice the 
t index 

at the 
i right, 


■ “GI-,” “content-” can safely be omitted. It is unnecessary when the content is externally generated, it is understood when the 

content consists solely of explicit elements, and for implicit elements it is implied by the period that separates markup at the start of an 
element from the following text. 

5 The question mark (?) means an element is optional, the comma (,) that it follows the preceding element in sequence, and the asterisk 
(«) that the element can occur 1 or more additional times. The stroke (/) is used to separate alternatives. Parentheses are used for 
grouping as in mathematics. 

‘ Some complete, practical definitions may be found in (4). 


71 





drive prompting dialogues for users unfamiliar with a 
document type. For example, a document entry applica- 
tion could read the model of a figure element and invoke 
control procedures for each element type. The control 
procedures would issue messages to the terminal prompt- 
ing the user to enter the figure ID, the depth of the 
artwork, and the text of the caption. They would also 
enter the markup itself into the document being created. 

Even in systems not sophisticated enough to read a model 
directly, knowledge of the model can be built into the 
control procedures. For example, in the case of our figure 
elements, the figure caption control procedure would ter- 
minate the figure body processing, and set a switch. If 
the end-of-figure procedure found the switch not set it 
would recognize that this instance of a figure had no cap- 
tion, and would invoke the end-of-body procedure itself. 
The result is as if the system had read the model, 
although at the penalty of increasing the complexity of 
the control procedures. 

Conclusion 

Regardless of the degree of accuracy and flexibility in 
document description that GML makes possible, the con- 
cern of the user who prepares documents for publication 
is still this: can GML, or any descriptive markup scheme, 
achieve typographic results comparable to procedural 
markup? A recent publication by Prentice-Hall Interna- 
tional (6) represents empirical corroboration of the GML 
hypotheses in the context of this demanding practical 
question. 

It is a textbook on software development containing hun- 
dreds of formulas in a symbolic notation devised by the 
author. Despite the typographic complexity of the mate- 
rial (many lines, for example, had a dozen or more font 
changes), no procedural markup was needed anywhere in 
the text of the book. It was marked up using the stand- 
ard GML supplied with a program called the “Document 
Composition Facility ” (DCF) (4). To this the author 
added some element types required for textbooks (such as 
“exercise”), and mnemonic names for “constant” elements 
like mathematical and logical symbols. 

The available control procedures supported only computer 
output devices, which were adequate for the book’s pre- 
liminary versions that were used as class notes. No con- 
sideration was given to typesetting until the book was 
accepted for publication, at which point its author balked 
at the time and effort required to re-keyboard and proof- 
read some 350 complex pages. He began searching for an 
alternative at the same time the author of this paper 


sought an experimental subject to validate the applicabil- 
ity of GML to commercial publishing. 

In due course both searches were successful, and an unu- 
sual project was begun. As the GML processor, DCF, 
does not support photocomposers directly, procedures were 
written that created a source file with procedural markup 
for a separate typographic composition program, TERM- 
TEXT/Format (7). Formatting specifications were 
provided by the publisher, and no concessions were needed 
to accommodate the use of GML, despite the markup 
having existed before the specifications. 7 

The experiment was completed on time, and the publisher 
considers it a complete success (8).* The control proce- 
dures, with some modification to the formatting style, 
have found additional use in the production of a variety of 
in-house publications. 

GML, then, has both practical and academic benefits. In 
the publishing environment, it reduces the cost of markup, 
cuts lead times in book production, and offers maximum 
flexibility from the text data base. At the same time, its 
rigorous descriptive markup makes text more accessible 
for computer analysis. While procedural markup (or no 
markup at all) leaves a document as a character string 
that has no form other than that which can be deduced 
from analysis of the document’s meaning, GML markup 
reduces a document to a regular expression in a known 
grammar. This permits established techniques of compu- 
tational linguistics and compiler design to be applied to 
natural language processing and other document process- 
ing applications. 

Acknowledgments 

The author is indebted to E. J. Mosher, R. A. Lorie, T. I. 
Peterson, and A. J. Symonds — his colleagues during the 
early development of GML — for their many contributions 
to the ideas presented in this paper, to N. R. Eisenberg 
for his collaboration in the design and development of the 
control procedures used to validate the applicability of 
GML to commercial publishing, and to C. B. Jones and 
Ron Decent for risking their favorite book on some new 
ideas. 

References 

1 Document Composition Facility: User’s Guide, Form 
No. SH20-9161-0, IBM Corporation, White Plains 
1978. 

2 Donald E. Knuth, TAU EPSILON CHI, a system for 
technical text, American Mathematical Society, Provi- 
dence, 1979. 


, Su th f contrar >'’ the publisher took advantage of GML by changing some of the specifications after he saw the page proofs. 

This despite some geographical complications: the publisher was in London, the book’s author in Brussels, and this paper’s author in 
California. Almost all communication was done via an international computer network, and the project was nearly completed before all 
the participants met for the first time. 


72 



vil- 


unu- 

DCF, 

were 

irkup 

RM- 

were 

:eded 

irkup 


3 C. F. Goldfarb, E. J. Mosher, and T. I. Peterson, “An 
Online System for Integrated Text Processing,” Pro- 
ceedings of the American Society for Information Sci- 
ence, 7, 147-150 (1970). 

4 Charles F. Goldfarb, Document Composition Facility 
Generalized Markup Language: Concepts and Design 
Guide, Form No. SH20-9188-0, IBM Corporation, 
White Plains, 1980. 

5 Charles Lightfoot, Generic Textual Element 
Identification— A Primer, Graphic Communications 
Computer Association, Arlington, 1979. 


6 C. B. Jones, Software Development: A Rigorous 
Approach, Prentice-Hall International, London, 1980. 

7 TERMTEXT/Format Language Guide, Form No. 
SH20-1372-1, IBM Corporation, White Plains, 1976. 

8 Ron Decent, personal communication to the author 
(September 7, 1979). 

9 B. K. Reid, “The Scribe Document Specification Lan- 
guage and its Compiler,” Proceedings of the Interna- 
tional Conference on Research and Trends in Document 
Preparation Systems, 59-62 (1981). 


lisher 
Toce- 
style, 
ty of 


;. In 
-kup, 
mum 
e, its 
sible 
r no 
tring 
uced 
rkup 
lown 
npu- 
d to 

"SS- 


T. I. 

the 
dons 
berg 
the 
i of 
and 
new 


orm 

lins, 

for 

ovi- 


>r in 
e all 


73 


I 


PEN: A Hierarchical Document Editor 

Todd Allen, Robert Nix, Alan Perlis 

Computer Science Department 
Yale University 


1 . Introduction 

Three terms in common usage in computerized text processing are 
text-editing, word-processing, and computer controlled typesetting. 
This paper deals with a fourth term, manuscript preparation, that has 
important intersections with the above three. A computerized 
manuscript preparation system is one that supports an author in the 
preparation of a manuscript. In what follows we deal with one such, the 
PEN system, directed towards the preparation of manuscripts 
containing significant mathematical notation. 

A partial list of desiderata for such a system is: 

• It should be interactive. 

• It should not be unduly restricted by previous dependency on 
paper as the display medium. 

• It should not penalize too severely the entry of mathematical 
text. 

•A character string representation of the text should be 
available for archiving and network transmission. 

• It should support programming and computation. 

• It should be capable of adapting text to a reasonable output 
representation on paper. 

• It should support a reasonable variety of alphabets and fonts. 

• It should make rational use of techniques developed for other 
and allied purposes. 

PEN attempts to satisfy these desiderata. 

PEN focuses on the needs of a scientist engaged in the writing of a 
technical manuscript, a highly structured document that uses a large 
variety of alphabets and fonts to convey information. It is often a 
trying experience to prepare a manuscript for publication. Some 
existing systems, e.g. TEX [6] and TROFF/EQN [4, 9], help a great 
deal by supplying the author with primitives for specifying the formulae 
in his document. The major drawback of these systems is that they 
require the user to specify a large amount of formatting information to 


This work was supported by a Joint Study with IBM. 


Permission to copy without fee ail or part of this material is granted 
provided that the copies are not made or distributed for direct 
commercial advantage, the ACM copyright notice and the title of the 
publication and its date appear, and notice is given that copying is by 
permission of the Association for Computing Machinery. To copy 
otherwise, or to republish, requires a fee and/or specific permission. 


0 1981 ACM 0-89791-043-5/81/0600/0074 S00.75 


lay out a small amount of content. 

Scribe [10, 11], although lacking formatting proficiency in 
mathematics, has shown how a hierarchical model of document 
structure can be exploited to reduce greatly the amount of necessary 
formatting detail. A Scribe document is tree-structured. The nodes in 
the tree are “chapters," “sections," “quotations," “descriptions,” etc.. 
Format parameters like indentation or font are scoped according to the 
tree. Reasonable values for these parameters are specified in the node 
definitions: they may be overridden when a node is instantiated 
(although this is not usually done.) A large database of common node 
definitions lets a Scribe user format the textual portions of his technical 
document without supplying much formatting detail. Unfortunately, 
Scribe does not help much when it comes to mathematics. 

PEN is as an attempt to construct an interactive formatter that 
combines the best features of the above. PEN displays the formatted 
document as it is being typed, giving the user immediate visual feedback 
(much as in Bravo [7].) This feedback is extremely important when the 
user is typing technical text; mathematics can be very hard to visualize, 
even when expressed in the best input language. PEN further simplifies 
mathematical text entry by providing a concise notation for specifying 
objects. This notation is the principal original contribution of PEN. 

While one expects the visual feedback offered by PEN to be its most 
salient feature, it has other important advantages: 

• PEN can provide the user with access to the organizational 
power present in document compilers like Scribe and TEX. 
These formatters hierarchically structure manuscripts, collate 
references, form indexes, number figures and so on. 
Unfortunately, this power is not accessible while developing a 
document; it is available only after a compilation. An 
integrated, interactive system like PEN can make these 
information processing capabilities interactively available to 
the user. 

• Many features of standard documents, notably inter- and 
intra- document reference, result from clumsy circumventions 
of the limitations of paper. These concepts can be captured 
more naturally by exploiting the dynamic windowing 
capabilities of the display. For example, when the user 
encounters a reference to an equation, say “(27)”, the equation 
itself may be shown in another window (relieving much of the 
mystery of such references.) 

•There is no reason that a document should be static. A 
time-varying function, for example, can be shown as actually 
varying with time. A program description can be runnable. 

One can easily imagine many other ways that the computer can help, 
and extend, the manuscript development process. 

This paper begins by describing the structure of the non-mathematical 
portions of PEN. We then present our notation for specifying and 


74 



y in 
ument 
:essary 
)des in 
" etc.. 
to the 
-• node 



plifies 

ifying 

V. 

most 


ite 

>n. 

I a 
\n 

‘.SC 

to 

id 

ns 

:d 

ig 

er 

m 

he 

A 

iy 

help, 

atical 
l and 


formatting mathematical objects. The motivations behind this design 
are discussed in Section 4. We conclude the paper with a comment on 
our experience with Lisp as a systems development language. 

2. PEN 

PEN runs in an environment that can support the features we desire in a 
system for interactive manuscript preparation. It is designed to run on 
a medium-sized personal computer with a large bit-mapped screen, such 
as a Three River’s PERQ or Xerox PARC’s Alto [13]. This 
environment provides the primitives needed for a sophisticated user 
interface: it supports many and varied fonts, interactive proportional 
spacing, multiple windows, and the other features we want. 

Many user interfaces are possible [2, 3, 7, 14]; we will not consider them 
further here. Rather, we present the organization behind our interface. 
While the user interface will determine the eventual success of PEN, the 
internal organization determines the flexibility and useability of that 
interface. User interfaces are highly dependent on the characteristics of 
available hardware, such as pointing devices, screen size, and screen 
refreshability. PEN’S internal structures generalize beyond the 
hardware used, while the user interface does not. 

2.1. Document Structure 

A PEN document is represented as a tree structure. The nodes of the 
tree are things like “chapters,” “sections” and “paragraphs.” The tree is 
the central object in PEN, the focus of all edit, format, and display 
operations. 

In Scribe there is little restriction as to which environments may be 
placed within others. In PEN, however, the tree structure is very strictly 
defined. Each node has a type associated with it that describes the 
number and type of children for that node; these node types are 
analogous to Pascal record types. For example, a node of type 
“chapter” might only allow a varying number of children of type 
“section”; a node of type “section” might allow exactly two children, a 
“title” and a “list-of-paragraphs.” 

Nodes are defined through a template, a prototype for a class of nodes 
in the tree. This template, or node definition , declares an object with 
several attributes: 

• The structure of its children. This describes the type and 
number of children this node can have. In PEN, nodes come 
in one of four flavors: and , or, vary, or leaf. In an and node, 
there are a fixed number of children, each of fixed type. An or 
node has exactly one child, whose type may be any one of a set 
of types. A vary node has a bounded but variable number of 
children, all of one type. Finally, a leaf node has no children 
and is of primitive type. Restricting nodes in this way does not 
affect the class of structures that can be expressed, although it 
may add layers to this expression. This restriction vastly 
simplifies code for processing these trees. 

• The parameters of interest to this node. Parameters specify 
attributes ranging from the textual content of the node to the 
width of a line of text. The template gives default values of 
parameters that may be overridden when the node is 
instantiated. When a parameter is referenced within an 
instantiation of a template, the system looks first at any 
overriding values specified in the instantiation, then back at 
the template, and so on up the tree, until it finds a value for the 
parameter. This value is normally dynamically bound, 
although there are provisions for offering different binding 
schemes. Editing a document amounts to changing the 
parameters specified on the tree. 

• A formatting function for this node. In a leaf node, this 
function may perform an arbitrary formatting task. In a 
non-leaf node, it is restricted to putting the node’s children 
together either vertically of horizontally. This restriction 
simplifies the task of incrementally reformatting the tree. 

• Functions that perform other tasks for this node. This is a list 


of (message, implementing function) pairs. The message 
concerns generic functions this object can respond to, and the 
function details this nodes response. One message, for 
example, asks a node to map a location on the display to one 
of its components (which may, in turn, be asked to do the 
same thing.) 

• A default instance for this node. This is intended to be a 
complete, but sketchy, example of the node which may be 
expanded by the user. This permits the standard technique of 
writing a highly stylized document by “filling in the blanks.” 

Type definitions can also be built up from simpler ones. A type may be 
defined as being another type with some of that template’s parameters 
modified. 

The template specifies a tree that is a syntactic description of the 
high-level organization of a particular type of document. Node 
definitions are the declarative format specification language of PEN. 
While it is possible to describe the structure of text all the way down to 
its characters, such a description would not be of practical use. It seems 
natural to break a document up into chapters, sections and even 
paragraphs; however, the user gains little from finer distinctions. PEN’S 
tree stops at the paragraph level, where formatting by simple horizontal 
and vertical juxtaposition breaks down. 

The inner nodes of a manuscript tree describe its form; its leaves contain 
the content. The average leaf node appears as a chunk of filled and 
justified text; although the only restriction put on a leaf is that it be able 
to format itself. The content of a “text” leaf is represented by a list of 
objects containing words, mathematical formulae, functions generating 
text, and so on. The leaf may have an arbitrarily complicated internal 
structure. Some other kinds of leaves in PEN are: dates, graphical 
figures, mathematical equations, programs, and so on. 

The syntactic restrictions imposed by the node definition may appear to 
be unduly strict, but in practice they aren’t. For example, a “generic 
paragraph” type may be defined as an or node that includes all the types 
commonly found in running text. 

The nodes that actually make up the tree are instantiated versions of 
templates. These nodes have several additional, optional, attributes: a 
formatted object representing all or part of the node; information about 
the display state of the node; and a user-supplied name. PEN’S “file 
system” is built from a forest of named objects. 

2.2. Formatting Primitives 

The box concept developed by Knuth for use in the TEX document 
compiler [6] is a general data structure representing formatted text. A 
simple box is a rectangle that might contain a character. More 
complicated boxes are built by putting these simple boxes together in 
various ways. A short description of a box is given here; Knuth’s TEX 
manual gives a far more complete description. Many of the features of 
TEX’s boxes are not used in PEN. In particular, since PEN is 
interactive, it does not use the features geared towards doing “optimal” 
formatting. 

A box wraps a rectangle around an anonymous object which we call the 
content of the box. It is this anonymity that makes boxes so powerful; 
formatting can be done independent of the content. Content is 
positioned within an enclosing box by filling the inner border of the box 
with a certain amount of glue. Glue also determines how the box reacts 
to being stretched, shrunk, or hinged. Boxes have two baselines that 
determine how they line up when juxtaposed: putting two boxes side by 
side will line them up along the horizontal baseline', putting one on top 
of the other lines them up along the vertical baseline. 

Use of the box data structure drastically simplifies most of the low-level 
aspects of formatting. A few primitive box manipulation functions 
form the basis for more sophisticated formatting operations. We list 
the primitives used in the system and describe the horizontal variety; the 


75 




vertical primitives have analogous definitions. 

Compose : takes two boxes and puts them together to make a new one. 

A horizontal composition aligns two boxes along their 
horizontal baseline and forms a new box around them. 

Hinge: makes a box conform to a particular size by breaking it up. 

A hinge will recursively try to break a box along its side to 
make it conform to a supplied dimension. A box may be 
marked as being unhingeable. A box that contains a word 
could be “hyphenated” when hinged. 

Stretch : makes a box conform to a size by stretching or shrinking it. 

A horizontal stretch of an atomic box will stretch it by the 
indicated amount if it is stretchable, otherwise it will leave it 
alone. Stretching a horizontal composition will distribute 
the desired expansion through all stretchable components in 
proportion to their ability to stretch. It will impose the 
same amount of growth on all members of a vertical 
composition. Stretching is used for justification, centering 
and other operations where an object is made to conform to 
a size. 

Align : adjusts baselines to control composition. This can be used, 

for example, to ensure that two pieces of text are aligned 
along their top by changing the baseline to be the top of the 
hunks of text. 

A box is either atomic (e.g. a character), or a composition of other 
boxes. The formatted representation of any part of a document would 
be a tree of compositions of these boxes, called a box tree. This tree 
corresponds to the document tree; the box of a node contains the boxes 
of its children. 

2.3. Incremental Formatting and Display 

The hardest part of programming any kind of display editor has always 
been efficiently maintaining fidelity between the internal representation 
and the display. Some editors have a tight coupling between the 
functions that change the file and the functions that maintain the 
screen; usually the functions that change the file also update the screen. 
Such a scheme is easy to implement for simple editors, but makes 
adding new functions quite difficult. The programmer must know both 
how to change the internal file representation, and how to rewrite just 
those parts of the screen that have changed. This process is even more 
complicated in PEN; a node must be reformatted as well as being 
redisplayed. 

PEN decouples the functions that change the document tree from 
formatting and display functions. An editing function making changes 
to the document tree merely indicates which nodes are affected by the 
change by removing the boxes corresponding to those nodes. Reformat 
and redisplay are performed efficiently when a request is made to 
synchronize the display with the current state of the document (e.g. 
when the editing function finishes.) 

Formatting a node means producing a box for that node suitable for 
display on the screen. The entire document tree need never be 
completely formatted, only that part of the tree which is currently being 
displayed. 

Because an internal node is a simple vertical or horizontal composition 
of its children, it is easy to format only a portion of that node. For 
example, a “chapter” might be defined as the vertical composition of a 
number of “sections." Since it is likely that only one or two sections at 
most might fit on a screen at once, it is not necessary to format all of 
them. 

A leaf node, on the other hand, is a node that must be completely 
formatted to be displayed, because its formatting function may do 
anything as long as it yields a box. It would be impossible to have a 
general mechanism for computing only portions of that box. 

As the user moves through the tree, PEN formats an area “around” the 
current node, called the “start node.” It is not sufficient to format just 


the start node and its subtree, because the visual context of that node 
(i.c. surrounding nodes) is also important. Incremental formatting 
begins by assigning the start node a position on the display. The tree is 
then traversed starting from there, formatting from “the middle of the 
tree out.” This process stops when the formatted neighborhood 
surrounding the start node fills the screen. The boxes resulting from the 
formatting are attached to the nodes in the document tree. Note that as 
the tree is traversed, the formatting environment must be updated (i.e. 
the para meter- value pairs hanging off the nodes and their definitions 
must be bound and unbound as appropriate.) 

One efficiency is possible when reformatting a subtree: the ancestors of 
the root of the subtree need not be reformatted if the subtree’s box has 
not changed appreciably. This is because the composition operator 
does not depend on content, it references only the box’s size and the 
positions of its baselines. Thus, leaf nodes such as paragraphs can have 
specialized reformatting functions that will not trigger further 
formatting unless the paragraph’s box has changed. These nodes can 
also have redisplay functions that limit their processing to the display 
area occupied by the node. 

Formatting a contiguous subsection of the document structure 
produces a corresponding tree structure of boxes, the box tree. The box 
tree is mapped to the display by the display processor. 

When the display processor is asked to refresh the screen, it begins 
traversing the entire box tree from the root down, propagating display 
coordinates from box to box in the tree. If a box is already displayed 
on the screen in the correct position, nothing more need be done with it. 
If a box does not overlap with the screen then it does not have to be 
displayed and the tree traversal may be pruned. If a box overlaps the 
screen, but has never been displayed, then a display request is generated 
that asks that the box be displayed. In the final case, where a box is 
displayed in the wrong location on the screen, a move request is 
generated. We assume it is more efficient to move bits on the screen 
then to redisplay a box (e.g. the machine has a bit bit, or Raster-Op, 
instruction [8, 13].) One further request type is generated when a node 
is deleted from the document tree. If a box is currently displayed when 
it is deleted, a request is produced to “blank out" the area occupied by 
the box. 

As the box tree is traversed, the requests are not processed immediately 
but are queued, for there might be two conflicting requests. A request 
to blank out a portion of the screen, which is generated by a move as 
well as an erase request, is partially overridden by a move into an 
overlapping area. A dependency graph is built from the request queue, 
where an arc in the graph represents a request whose destination 
overlaps with the source of another request. It is easy to show that this 
graph is a dag, as we are guaranteed that boxes do not overlap when on 
the screen. The requests are then processed in topological order, 
insuring that no box is overwritten before it is copied to its new 
location. 

This algorithm can be implemented efficiently to give fast response on a 
single-user machine. If there are n display requests and m arcs in the 
dependency dag, the algorithm runs in time 0((n+m)\o%n)[\\ The 
number of requests for any screen refresh is usually quite small (under 
15), because the average edit and display operation is quite simple. This 
method is fast enough; ad hoc methods in which an edit operation 
orchestrates the screen update are not worth the added complexity. 
Formatting and display are efficient, yet simple and independent of 
editing and the rest of the system. 

2.4. Editing 

This section presents PEN’S structural editing commands: the 
commands that alter the document tree. Character-level editing in PEN 
is similar to that encountered in many screen editors. The user moves a 
cursor to the location where he wishes to enter text and starts typing. 


76 



)de 

-.ting 
• tree is 
of the 
-rhood 
jm the 
that as 
d (i.e. 
litions 


:ors of 
)x has 
aerator 
nd the 
n have 
urther 
es can 
isplay 


ucture 
le box 


begins 
isplay 
clayed 
dth it. 
to be 
ps the 
;rated 
box is 
est is 
icreen 
’ r-Op , 
ie 
.en 
led by 


liately 
;quest 
jve as 
to an 
lueue, 
ation 
it this 
en on 
3rder, 
; new 


e on a 
in the 
The 
under 
This 
ation 
exity. 
:nt of 


the 
PEN 
)ves a 
g- 


There are a small number of basic commands used to modify the 
structure of a document. Commands may be illegal in a given location 
due to the syntactic restrictions imposed by the document structure. 
For example, inserting a chapter in the middle of another chapter is an 
illegal operation, because only sections are allowed within chapters. 
However, the intent of the user is clear; he wants the chapter to be 
decomposed into its sections and each of these sections to be inserted in 
turn. This sort of decomposition could be done automatically. Such 
“type coercions" are in the province of the user interface and are not 
handled by the editing primitives. 

The following are the primitives used to manipulate the document tree. 
They function only if they preserve legal structure: 

Create Inserts a default instance of a node of a particular type at 
the current location. 

Pick A copy of the node is stored away in a document we call a 
“pick buffer. "Delete 

The current node is deleted. If the document structure 
requires that a node of this type be present at this location, a 
default instance is reinserted. A deletion does an implicit 
Pick. 

Put A pick buffer is inserted in the current location. 

Split The parent of the current node is split at the the current 
node, creating two new nodes. The siblings to the left of the 
current node make up a prefix of the children of the new 
node to the left, the current node and its brothers to the 
right form a suffix of the other new node. 

Join Two adjacent nodes on the same level are joined into one 
node that has both their children. 

These primitives describe the logical effect of an editing operation; the 
actual function implementing the change is determined by a particular 
node. For example, the function that implements a word split does not 
do the same things as one that splits chapters. Some node types, e.g. a 
graphical figure generated by a formula, may not be able to 
meaningfully perform certain operations. This is considered further in 
Section 3.6, where we discuss the editing of mathematics. 

The user can edit parameters other than the textual content of a node. 
Changing a parameter in a node instantiation changes it only for that 
node; changing it in a node template changes the value for all 
instantiations of the template that do not override the parameter. Some 
attempt is made to determine the effect a change has on previously 
formatted instantiations of the node, but often complete reformatting is 
required. Node definitions have a textual representation and thus can 
be edited and incorporated into the system, defining new type of nodes. 

3. PEN-MATH 

An important feature of document tree nodes is that they encapsulate 
all knowledge of the entry and manipulation of node content. This is 
particularly important for mathematics, figures, graphs, etc., which 
must be handled in ways quite different from ordinary text. The 
stylized interface between the document tree and the formatting 
functions in its nodes allows the development of these functions to be 
completely independent of the document tree. In particular, it was 
possible to develop PEN-MATH (a notation for mathematics 
formatting and evaluation) independently of the rest of PEN. This 
section describes PEN-MATH. 

As mentioned in the introduction, preparing printed text containing 
mathematical formulae and objects is a difficult and unpleasant task. 
There are a variety of reasons for this. The language of mathematics 
employs a large assortment (more than 100 by our estimation) of 
symbols not found in ordinary text. Type face (font) also carries 
information. Where most manuscripts employ only two or three 
typefaces a scientific paper frequently uses five or more fonts. Type size 
conveys information, too. Compounding the difficulties, mathematics 
is a two dimensional language; vertical and horizontal position are as 
important as the letters and symbols. As a consequence, both the 


typing and the printing of mathematical text are arcane arts. It is 
difficult to convey precise intent to the the typist (or printer), yet more 
difficult for that person to do what you’ve asked. 

Systems such as TEX and EQN provide simple mechanisms for 
handling the myriad symbols, type faces, and type sizes, and apply 
varying degrees of expertise on the rules of mathematical typography. 
However, these systems have only rudimentary knowledge of 
mathematics and mathematical structures. They reduce the difficulty of 
specifying linear expressions and vectors to an acceptable level, but 
contribute little towards reducing the difficulty of specifying complex 
arrays or tables of formulae. 

We hypothesize that these systems fall short because they are primarily 
text processing systems with a little bit of mathematics formatting 
added. They are not mathematics text processing systems. In 
PEN-MATH we are concerned with the syntax and semantics of 
mathematics formatting and with the exigencies of displaying 
mathematical text. 

The input language is based on APL because it concisely describes the 
manipulations of expressions, arrays, and sequences most commonly 
used in mathematics formatting. Just as APL can reduce a matrix 
multiply subroutine to five characters, so will PEN-MATH reduce the 
time and effort required to describe and format large and complex 
arrays. However, we must interject a caveat: PEN-MATH will not 
provide much savings for linear expressions and simple vectors. 
Existing systems handle these quite well. The only potential savings are 
the reduction of some multiple key stroke operations to single 
characters. However, PEN-MATH will reduce the command strings 
generating complex structures such as the array in figure 3-1 from 
hundreds of characters to dozens (the example is generated by a seven 
character expression). 


ai i 

ai 2 

ai 3 • 

ai n 

a 2 1 

a 2 2 

a 2 3 . 

. . 32 n 

S 3 1 

as 2 

a 3 3 . 

■ • 33 n 

am 1 

a m 2 

a m 3 • ■ 

. . a m n 


Figure 3-1: Array generated by: A Jim Un 

Often an author may wish to present both formulae and evaluations of 
them. No currently existing system is of assistance as they cannot 
evaluate the mathematical forms being set. The ability to both evaluate 
and format will be a unique feature of PEN-MATH. 

We have attempted to describe PEN-MATH via examples and 
discussions of specific issues. While not comprehensive, they illustrate 
our approach to the difficulties of specifying and formatting formulae 
and arrays. As stated above, the input language for PEN-MATH is 
derived from APL. We assume the reader is familiar with APL and 
describe only those features of PEN-MATH which either differ from 
APL or are of particular interest to mathematics formatting. APL 
operators not mentioned can be assumed to function analogously to 
those illustrated. 

3.1. Basic Linear Forms 

A goal of PEN-MATH is to allow user input to be as close as possible 
to what he would write on paper. Operators are usually not evaluated, 
but serve as formatting commands. Thus, “1 + 2" produces “1 + 2”, and 
similarly for the usual operators (x, +, — , U, e, <, <, etc.). The basic 
formatting operation may be described as follows: 

• Evaluate and format the right and left arguments. If necessary 
enclose them in parentheses. 

• Format the operator according to context and its looks 
(described below). 

• Adjust the space on each side of the operator to appear to be 


77 




I 


I 

I 


the same. Make the visual horizontal centers of the arguments 
and operator line up. 

Similar formatting is provided for prefix notation. E.g. “sin(x)" 
produces “sin(x)" (or "sin x”) with appropriate font, size, and other 
formatting adjustments. 

Difficulty arises with some operators that have several accepted 
representations. Multiplication, for example, may be written as “a x b", 
‘a.b", or "ab". So long as evaluation is not required, any input 
producing the desired output is acceptable. Thus, “a x b”, “a.b", and 
“ab" may be used to produce the above output. We expect the 
utilization of evaluation and other esoteric features of PEN-MATH to 
be rather infrequent and attempt to anticipate the desires of the “naive" 
user. The more sophisticated user must be prepared to program 
carefully to ensure correct evaluation and fancy formatting. 

The problem with using “ab" or “a.b” instead of “axb" is that the 
system may not understand that the computational intent is to multiply 
"a" and “b”. It may produce something nice, but it will not be able to 
apply any special knowledge it may have about “times" (e.g. rules about 
when parentheses are required, or how to evaluate expressions). On the 
other hand, if you write “axb” so the system understands the 
computational intent, how can it be instructed to produce “ab" as the 
formatted output? 

In PEN-MATH all formatting operators have attributes or looks which 
control the way they appear in the output. ( Looks are virtually 
identical to SCRIBE'S environment attributes.) The looks of any 
operator may optionally be written after it in curly braces. Thus, 
“a x b", “a.b", and “ab” might be produced by “a X (;cross) b”, “a X {; 
dot} b", and "a x {;juxt} b" respectively. There is, of course, a 
mechanism for telling PEN-MATH just what looks to assume so they 
need not be specified each time. Most operators have looks which 
control: the right amount of space to separate the operator from its 
operands; how large to print the symbol relative to the size of its 
operands; how to adjust the horizontal base lines of its operands 
relative to its own; alternate ways to format the operator; and (possibly) 
a mathematical definition of the operator so it may be evaluated. 

Due to the large number of mathematics/ formatting operators in the 
system we found it convenient to adopt APL's convention of evaluating 
all expressions in a strictly right to left order except where modified by 
parentheses. The formatter includes parentheses in the output only if 
required or requested by the user. 


3.2. Simple Arrays 

An important feature of PEN-MATH is its understanding of 
mathematical structures, particularly arrays. Scalar operators extend 
to arrays as they do in APL. Thus, “1 23 + 45 6" produces 
“1+4 2+5 3+6”by applying the "+”format operator to successive pairs 
of elements from the vectors “12 3” and “4 5 6". The result is a vector of 
expressions. Similarly, “1 +abc” produces “1+a 1+b 1+c”. The 
substring “a b c" is a vector of unevaluated symbolic variables. 
Identifiers are treated as unevaluated symbols unless the user or the 
context requires otherwise. Symbolic variables are treated as atomic 
objects and may be elements of symbolic arrays. 


PEN-MATH contains many operators for the manipulation of arrays. 
These are carried over from APL with only slight modification. A 
useful example is reduction. The expression “+ / I 2 3” will produce 
“1+2 + 3". Reduction is useful when combined with the index 
generator (“i”), e.g. 


((n Xn + I) + 2) = +/ m 
produces the identity 


n(n+l) 

2 


1 + 2 + 3+ . . . + n 


Notice that the system adjusts the parentheses and inserts ellipsis in the 


series (this mechanism is explained in Section 3.4). 

An idiom exists for producing alternating sequences. Using 
PEN-MATH's index generator (“i” — the dyadic form differs from 
APL, see table 3-2 in section 3.4) to generate the sequence of integers 
from 0 to °°, the expression 

sin(x) = '-+' / (x • 1 + 2 x 0 i[4) °°) + ! 1 + 2 x 0 i{4) °° 

(the set of integers has been extended to include +°» and -“) will 
produce the series definition of sin(x): 



The subexpression “ '-+' ” is an operator vector. Reduction works by 
placing successive elements of the operator vector between successive 
pairs of elements of its right argument. Thus, “ '++' / . . , "produces a 
continued fraction. 


PEN-MATH has limited ability to perform symbolic arithmetic. Thus, 
if the looks of “1", “x", and “+” included the identities “1 = 11", 
x = x* 1 , and “x = x+ 1 ”, then the definition of sine given above would 
be formatted as: 


Such knowledge is ad hoc and limited. These abilities are included only 
so the user need not worry about standard special cases such as those 
illustrated above. The expressive power of PEN-MATH is obvious. In 
a conventional system it would take a minimum of 40 key strokes 
(actual count would probably be well over 100 and not work correctly 
for the first 2 or 3 tries) to produce the above definition of sine. 
However, in PEN-MATH 26 keystrokes strokes are required. Not only 
is typing time greatly reduced, but, collaterally, so is thinking and 
debugging time. 


3.3. Subscripting 

During formatting, subscription is an array builder, not an element 
selector. Subscripts are indicated with the down arrow (“1"), e.g. “Xln” 
produces X„. Up arrow (“1”) is used for superscription, e.g. “Xln" 
produces X". There are six places where indices could be placed. This 
is controlled by a “look" of the subscript operator, to wit: 

XI(I}1 I {2} 2 1(3} 3 1(4)4 1(5)5 1(6)6 
produces 

432 
X . 

561 


Unlike other operators, subscription extends to arrays via outer 
product, not element by element pairing. This feature makes 
subscription a very powerful array builder as is demonstrated in table 
3-1. Table 3-1 gives further evidence of the expressiveness of 
PEN-MATH. In a conventional system the number of keystrokes 
required to format an array is at least four times the product of the 
dimensions of the object produced. In PEN-MATH the number of 
keystrokes will be about two times the sum of the dimensions of the 
produced object plus its rank. The key stroke count is even smaller 
when array generators such as index (“i”) and reshape (“p”) are used. 

Subscripting is the single exception to right associativity in 
PEN-MATH. Namely, in order that “xtitj" produce “x‘ as expected, 
the subscript operators must be left associative. Parentheses must be 
used to produce subscripted subscripts, e.g. “X 1 (a I i)” produces Xa„ 
When a subscripted object must be embedded in an expression, its 
rightmost subscript must be terminated by a right parenthesis or a 
semicolon. E.g. “a 1 1+2 I 3+4; + 5" and “(a t 1+2 I 3+4) + 5” both 
produce “a" 2 3 “ +5", but “a 11+21 3+4+ 5” produces “a 1 ’ 2 3 * , * 5 ”. 


78 




PEN-MATH 

formatted output 

PEN-MATH 

array pattern 

array pattern 

g 

expression 

shape 

(formatted output) 

expression 

(resultant object) 


from 

a f 1 


n i m 

m-n+1 

n n+ 1 n+2 . . . m 

egers 

a 1 

nt(l}m 






m-n+1 

n . . . m 


a t 1 23 

a 1 a 2 a 3 





a b c t 2 


n i{4 2} m 

m-n+1 

nn+1 n+2n+3 ... m-1 m 


a 2 b 2 c 2 




will 



n i{0 0} m 

m-n+1 

n n+ 1 n+2 . . . m 


abet 1 23 

a 1 a 2 a 3 





b 1 b 2 b 3 

9 i 3 

7 

987 ... 3 



c l c 2 c 3 

9 i{0 0} 3 

7 

9876543 

s by 

a l I 2 3 t 1234 

a} a 2 a? at 

i{4 2} °° 

oo 

1 234 . . . 

>sive 


a2 ai a! a2 

-« i{4 2} 0 

oo 

... ‘1 0 

Jes a 


a! a 2 a 3 a£ 







Table 3-2: Controlling array pattern show with index generator looks. 


(a 11 2 3) f 1 2 3 

ai 1 ai 2 ai 3 




hus, 


12 3 

a2 a2 a2 


ai i ai 2 ai 3 . 

• • ai„ 

1!”, 


„ 1 2 3 

as a3 aa 


a2 1 a2 2 a2 3 . 

• . a 2n 

mid 






Table 3-1: Examples of array generation via subscripting. 


>nly 

ose 

In 

kes 

:tly 

ne. 

nly 

ind 


;nt 


er 

ss 


3.4. Array Patterns 

Often, when an array’s structure can be inferred from a few of its 
elements it is not displayed in full, but is abbreviated with ellipsis. 
When ellipsis is used we refer to the formatted object as an array 
pattern. PEN-MATH incorporates several features which simplify the 
generation of array patterns: 

• Array patterns are input as if they were arrays. Expressions 
generating array patterns are usually indistinguishable from 
those creating arrays. 

• Index generator (“O creates primitive array patterns. 

• An algebra defining the composition of array patterns has 
been developed. Most commonly occuring array patterns are 
a simple sequence of pattern compositions. 

• Ellipsis has been added to the number set to allow 

explicit creation of array patterns. 

• Array patterns have looks which allow the user to control the 
number of elements which appear before and after the ellipsis. 

• The formatter has algorithms specifically for array patterns. 

Most array patterns are based on the sequence of consecutive integers 
created by index generator (“i”). The monadic form generates the 
integers from 1 to the right argument; the dyadic form returns the 
integers from the left argument to the right argument (a significant 
departure from APL). The arguments may be either symbolic or 
numbers (including «). The right argument need not be larger than the 
left argument, though this will be assumed if both arguments are 
symbolic. Table 3-2 illustrates use of i. 

Index generator’s looks indicate how many elements should appear at 
the the head (beginning) of the pattern and how many should appear at 
its tail (end). This information, known as the show of the array pattern, 
is a positional look of i and is given as a vector of two integers. The 
default show of patterns is {3 1}, but this may be changed by the user. 
Specifying a show of {0 0} is an idiomatic way of asking the formatter to 
display the entire array. If the array is infinite or if its shape contains 
symbolic elements, then an array pattern with default show is produced. 
(Constant vectors included in PEN-MATH expressions have {0 0} as 
their show.) Examples of the interaction between i and show are given 
in table 3-2. 

It is possible to create multi-dimensional array patterns via pattern 
composition. Such a pattern may have a different show attribute in 
each direction (dimension). We represent the show of any pattern as a 
vector of integers of even length, where the first pair of elements give the 
show of the pattern’s first dimension, the second pair give the show of 
the pattern’s second dimension, etc. For example, the the array pattern 


a m i a m 2 am 3 . . . a m n 

is generated by “a U{2}m Itn” and has show 2 13 1. (N.B.: “i{2}” is 
equivalent to “i{2 1}” as 1 is the default show for the tail of an array 
pattern. The affect is that only the first two rows of the pattern are 
shown.) 

As in APL, the shape of an array or array pattern is represented by a 
vector specifying the depth of the array in each dimension. However, as 
the arguments to index generator and reshape may be symbolic, some 
elements of the shape may be symbolic. Table 3-2 contains examples of 
patterns with symbolic shape vectors. 

To explain the various compositions of array patterns we must define 
the following terms and notations: 

• A scalar function is a mapping of scalars to scalars. The 
standard mathematical functions are all scalar functions. As 
in APL, they can be extended to arrays. 

•The remaining operators in PEN-MATH are all structure 
modifiers, functions which either create arrays or modify the 
content or structure of arrays. 

• Let “F* and “g’’ be arbitrary scalar functions. 

• Let “A", “B”, and “C” be arbitrary array patterns. 

The application of a scalar function to an array pattern and a scalar 
produces a array pattern whose shape and show are the same as for the 
original pattern. Monadic operators applied to array patterns also 
function this way. 

M C*~AfB” is a pattern if A and B conform (i.e. corresponding 
members of their shapes must either both be symbolic or be equal 
numbers). Assuming corresponding symbolic elements of the shapes of 
A and B to be numerically equal, we choose to give C A’s shape. We 
assume the user has made the elements of the show of an array to be just 
large enough to allow the array’s structure to be inferred from the 
information shown, and that if any element of the show were reduced 
this would not be possible. Consequently, we set the C’s show to be the 
maximum of the shows of A and B. 

The outer product “C — A °.f B” is an array pattern whose shape and 
show is the concatenation of A’s and B’s shape and show, respectively. 
(An example of the usage of outer product is given at the end of this 
section.) As already described (and illustrated in table 3-1), 
subscription extends to arrays via outer product and can also create 
array patterns. 

Reduction along the i*th coordinate, “C - f / ii; A”, produces a pattern 
with the i’th coordinate “compressed” into a single element as in APL. 
Although the elements of the resultant array pattern are scalars, they 


79 





bear closer resemblance to array patterns than to integers and symbolic 
variables. We call the object resulting from reduction series patterns. 
They are scalars, but have length (shape) and show as do array patterns 
(which could be called sequence patterns). The definition of sine given 
in Section 3.2 is an excellent example of a series pattern. The inner 
product “C — A f.g B" produces an array pattern whose elements are 
series patterns. (An example of the usage of inner product is given at 
the end of section 3.5.) 

Other structure manipulators such as transpose, concatenate, reverse, 
rotate, and scan also produce array patterns. Their definition is 
analogous to the examples given above. Reshape and take occasionally 
produce array patterns based on replication. 

Many useful array patterns require a bit of programming and depend 
upon PEN-MATH’s facility for symbolic evaluation. For example, the 
diagonal array 

an 0 0 ... 0 

0 a2 2 0 ... 0 

0 0 asa . . . 0 

0 0 0 . . . a„„ 

is generated by “a lin Un; aa X (tn) °.= in”. The “aa” in the 
expression tells PEN-MATH to evaluate all operators to the right 
before formatting. 

3.5. Symbolic Evaluation 

PEN-MATH evaluates all input expressions prior to formatting. We 
have attempted to design the interaction between the parser and the 
evaluator so that formatting occurs as expected by a relatively 
unsophisticated user. Thus, scalar operators are usually not evaluated 
(we assume the appropriate formatting of “1 +2” is “1 +2”, not “3”), 
and all structure modifiers are evaluated (we assume the correct 
formatting of “i3” is “1 2 3”, not “i 3”). A notable exception to this rule 
is index generator; the arguments of i are evaluated unless the user 
explicitly indicates otherwise. This is based on the assumption that the 
correct formatting of “(m-1) i{3 2} n+1” is 

m-1 m m+1 ... n n+1 
not 


When the character “a” precedes an identifier, the identifier is treated as 
a meta-variable, i.e. during evaluation it is replaced by its value. When 
the character “a" precedes parentheses, it tells the formatter to make 
sure the parentheses appear in the formatted output, and it tells the 
evaluator to encapsulate the contents of the parentheses, making it look 
like a scalar to all operators outside of them. In contrast, without the a, 
“(aX) X (aY)” would be formatted as: 

axe bxf 

cxg dxh . 

3.6. Editing of PEN- M A TH 

Screen editing of formatted PEN-MATH expressions differs greatly 
from editing text. This is due to the tenuous relationship between 
formatted objects and input expressions. Consider the array in Figure 
3-1. If the user points to the array element “a3 n ” what substring of the 
generating expression “A Um i tn” is being refered to? There is no direct 
relationship between the elements of the array and the expression. At 
best, pointing at an arbitrary fragment of a formatted object will select a 
box whose content is either an array element or a subexpression 
containing the fragment. This box usually will not bear any obvious 
relationship to the generating expression. 

Under these circumstances, the opportunities for editing appear to be 
limited to two possibilities. (In the following, the linear expression 
generating the edited object is called its generator.) One approach 
would use the selected fragment as a template for changes to the entire 
object. An attempt would be made to identify the smallest substring of 
the generator producing the fragment. If this can be done (it may not be 
feasible on a small system), the substring would be replaced by a string 
reflecting changes made to the fragment. These changes may be made 
either by editing the screen image, requiring additional attempts to 
invert the formatting process, or by entering a PEN-MATH expression. 

The second approach would restrict changes to the selected fragment. 
The generator would be modified to store its output in a PEN-MATH 
variable. Changes to components of the edited object would cause a 
series of assignment statements modifying this variable to be generated. 
The editing process replaces the original generator with a list of 
expressions containing the modified generator and assignment 
statements. The changes could be made either by direct screen editing 
or by entering a PEN-MATH expression. 


m-1 m-1+1 m-1+2 . . . n+1-1 n+1 . 

There are ways for the user to easily specify exactly which parts of any 
expression should be evaluated. 

Most of the scalar functions will accept symbolic arguments during 
evaluation, producing a symbolic result. PEN-MATH’s initial 
knowledge of symbolic evaluation is limited to handling the most 
commonly occuring cases, e.g. the use of i illustrated above, and the 
definition of sine in Section 3.2. Facilities exist for the user to define 
new scalar operators or to extend existing ones. However, PEN is 
designed to run on a medium-sized personal computer, and the 
temptation to make PEN-MATH emulate a complete symbolic 
manipulator such as M ACS YMA should be avoided. 

It is possible to store expressions in PEN-MATH variables. This 
facility is useful when an expression or array must appear at several 
places in a manuscript. It also allows complicated objects to be 
constructed from several short expressions rather than one long, 
unreadable expression. To illustrate this usage, consider the following 
three line PEN-MATH “program” 

X — 22pabcd 
Y~22pefgh 

(a(aX) x a(a Y)) = a(aX + .x{;juxt} aY) 

whose formatted output is the canonical inner product: 

( a b\ v f e f ^ ( ae+bg af+bh\ 

c d J /\ ^g hj ^ce+dg cf+dh / . 


Regardless of any editing, the math nodes of the document tree contain 
only linear PEN-MATH expressions. Consequently, due to the 
difficulty of inverting the format process and the desire to keep the 
generators as concise as possible, we recommend that most editing be 
done to the generators, not to the formatted objects. The user can edit 
and request sample formatting of expressions in a separate window, and 
incorporate them into the document under preparation. 

4. Design Motivation 

We felt that knowledge of interactive manuscript development systems 
was not sufficiently advanced for a powerful, but inflexible, design to 
succeed. Thus, PEN was designed with user extensibility in mind. To 
be truly extensible, a system has to be simple, even at the lowest levels. 
The user’s model of the system’s inner workings must be close to the 
actuality. We often sacrificed some power or efficiency to adhere to this 
goal. 

We decided that it was not necessary for a printed document to exactly 
reflect the image on the screen. We considered it important to get nicely 
printed output, but did not believe that an interactive system had to be 
an expert in the techniques used to generate it. A good document 
compiler can transform a “correct” sequence of commands into an 
attractively printed document. If systems like PEN can supply 
“correct” command sequences, then one should feel confident that the 
compiler will make the printed copy look good. 

This decision has far reaching consequences: it allows an interactive 


80 



IS 

n 

make 
is the 
look 
he a. 


-*atly 
ween 
igure 
>f the 
lirect 
. At 
ecta 
.sion 
ious 


o be 
ision 
oach 
ntire 
ig of 
>t be 
ring 
nade 
s to 
ion. 


;ent. 
\TH 
se a 


.ent 


• *ing 


ntain 
the 
> the 
g be 
edit 
and 


ems 
:n to 
To 
vels. 
the 
this 


tctly 
cely 
3 be 
lent 
• an 
nply 
the 


tive 


system to concentrate on what it must do well and ignore difficulties 
arising from pagination, footnotes, figures, etc. In our case it freed 
PEN to provide only the sort of interactive formatting that “fit” a 
hierarchical document model. This covers most things a formatter 
does. Almost all text and mathematical formats can be formatted 
interactively, because they access information that is efficiently 
maintained within the model. The only major features of printed 
documents that are not supported are those related to printed page 
layout. In particular, we do not promise to maintain page breaks and 
the associated page numbers; such things are irrelevant during 
interactive preparation. It is possible to create a system whose screen 
exactly mimics the printed output. However, there are advantages to a 
system which does not. 

Speed is one advantage; producing nice output takes time. A document 
compiler that does not have to respond in fractions of a second can 
afford to put a great deal of effort into producing nice output. It can 
use global information to “optimize” the appearance of the printed 
manuscript [6]. We believe that promising fidelity between the screen 
and the printed page must either limit the quality of the printed output, 
or reduce the interactiveness of the system. No matter how good “fast" 
formatting methods become, those that spend more time can always do 
better. The constraints of an interactive system should not be applied to 
“batch” document compilers and vice-versa. 

Simplicity is another advantage. The user is not burdened with a 
command language that refers to quantities like page numbers that are 
irrelevant during manuscript development. A system that offers several 
unrelated “views” of a document, e.g. both paginated and hierarchical, 
must maintain both views. Updating these parallel representations 
makes the system more complex internally. One might argue that the 
internal structure does not matter, as the user never sees it. Experience 
has shown this to be false. Systems with complex internal structure 
show it to the user in many ways (e.g. the system is likely to be hard to 
extend and maintain.) 

This decision further simplifies the system by allowing a “software 
tools” approach to its design [5]. The document compiler, a potentially 
complex program, can be developed and maintained separately. 
Advances in formatting techniques that may be inappropriate for 
interactive use can still be used to advantage. PEN need only know the 
necessary commands to tell the document compiler how to attractively 
interpret the manuscript. 

5. Implementation in Lisp 

It should be noted that, like all other software systems, the set of 
functions to be included (and omitted) will be determined by both 
design and use. The former must not severely limit the contributions of 
the latter. Indeed the latter is a primary source of the former and points 
out the iterative process by which PEN will ultimately stabilize into a 
useful milieu for manuscript preparation. The programming tactics 
employed in producing PEN must not inhibit redesign and 
implementation. For both of these reasons Lisp has been chosen as the 
system language. To complete the implementation cycle, methods of 
mapping Lisp systems into functionally equivalent Pascal-like systems 
are being developed. 

We could not have designed PEN “in advance,” but had to be able to 
test out (and often discard) our ideas in a prototype. To make this 
process as painless as possible, we developed PEN in Lisp. Lisp is an 
unparalleled tool for developing interpreters [12], and PEN is basically 
an interpreter driven by the document tree. Lisps also have the best 
program development environments (at least for medium-sized 
programs like PEN). The lack in Lisp of a lush landscape of data 
structures requires one to treat data structures functionally and leads to 
object oriented programming even if one isn’t aware of it. The Lisp 
system that has resulted is quite slow, but it is fast enough to allow us to 
experiment. 


6. Acknowledgements 

We would like to acknowledge the support of the IBM Cambridge 
Scientific Center. John Ellis, the “E”, worked on the early versions of 
PEN. The Yale TOOLS group provided a superlative environment for 
multilingual programming. 


References 

[1] Bentley, J.I. and T.A. Ottmann. 

Algorithms for Reporting and Counting Geometric Intersection. 
IEEE Transactions on Computing 28(9):643-647, September, 
1979. 

[2] Borkin, Sheldon A. and John M. Prager. 

Some Issues in the Design of an Editor- Formatter for Structured 
Documents. 

Technical Report, IBM Cambridge Scientific Center, 

November, 1978. 

[3] Engelbart, D. C. and W. K. English. 

A Research Center for Augmenting Human Intellect. 

In Proceedings of the 1968 FJCC, pages 395-410. AFIPS 
Conference Proceedings, Montvale N.J., December, 1968. 

[4] Kernighan, B.W. and L.L. Cherry. 

A System for Typesetting Mathematics. 

Communications of the A CM 18(3): 182-193, March, 1975. 

[5] Kernighan, B.W. and P.J. Plaugher. 

Software Tools. 

Addison-Wesley, Reading, Massachusetts, 1976. 

[6] Knuth, D.E. 

TEX: A System for Technical Text. 

Technical Report AIM-217, Stanford University, November, 
1978. 

[7] Lampson, B.W. 

Alto User’s Guide: Bravo Manual. 

Xerox Palo Alto Research Center, 1978. 

[8] Newman, W.M. and R.F. Sproull. 

Principles of Interactive Computer Graphics. 

McGraw-Hill, New York, New York, 1979. 

[9] Ossanna, J.F. 

TROFF User’s Manual. 

Technical Report 54, Bell Laboratories, 1977. 

[10] Reid, B.K. and J.H. Walker. 

Scribe User Manual. 

CMU Computer Science Department, 1978. 

[11] Reid, B.K. 

A High-Level Approach to Computer Document Formatting. 

In Seventh Annual A CM Symposium On Principals of 
Programming Languages , pages 24-31. ACM, January, 

1980. 

[12] Sandewall, Erik. 

Programming in the Interactive Environment: The Lisp 
Experience. 

Computing Surveys, ACM 10(l):35-72, March, 1978. 

[13] Thacker, C.P., E.M. McCreight, B.W. Lampson, R.F. Sproull, 
and D.R. Boggs. 

Alto: A Personal Computer. 

In D. Siewioreck, C.G. Bell, and A. Newell, editors. Computer 
Structures: Readings and Examples , . McGraw-Hill, 1979. 

[14] Wood, Steven R. 

Z: The 95% Program Editor. 

In Proceedings of ACM SIGPLAN/ SIGOA Symposium on 
Text Manipulation. ACM, June, 1981. 


81 


JANUS: AN INTERACTIVE SYSTEM 
FOR DOCUMENT COMPOSITION 

Donald D. Chamberlin 
James C. King 
Donald R. Slutz 
Stephen J. P. Todd 
Bradford W. Wade 

IBM Research Laboratory 
San Jose. California 


ABSTRACT 

This paper describes the architecture of a proposed 
document composition system named JANUS, which is 
intended to provide support for authors of complex 
documents containing mixtures of text, line art, and 
tone art. The JANUS system is highly interactive, pro- 
viding authors with immediate feedback and direct elec- 
tronic control over page layouts, using a special two- 
display workstation. Authors communicate with the 
system by marking up their documents with high-level 
descriptive "tags". A tag definition language is provid- 
ed whereby new tags may be defined and the format of 
each tagged object may be controlled. 


INTRODUCTION 

This paper describes the architecture of a proposed 
document composition system named JANUS. The 
objective of JANUS is to provide interactive support 
for authors of complex documents containing mixtures 
of text, line art, and tone art. Typical examples of such 
documents are technical manuals for assembly, mainte- 
nance, and repair of equipment, which may have several 
illustrations on each page. In today’s technology, prod- 
uction of these documents typically involves a manual 
"pasteup" step in which illustrations are merged with 
text and individual page layouts are determined. Manu- 
al pasteup is labor-intensive and time-consuming, and it 

Permission to copy without fee all or part of this material is 
granted provided that the copies are not made or distributed 
for direct commercial advantage, the ACM copyright notice 
and the title of the publication and its date appear, and notice 
is given that copying is by permission of the Association for 
Computing Machinery. To copy otherwise, or to republish, 
requires a fee and/or specific permission. 

' 1981 ACM 0-89791-043-5/8 1 /0600/0082 S00.75 


results in a long "turn-around time" between the author 
of a document and the finished product. Manual paste- 
up also greatly increases the difficulty and expense of 
making revisions to a document, or of maintaining mul- 
tiple versions of a document (e.g., for various models of 
a piece of equipment). One of the objectives of JAN- 
US is to bring the process of page layout under the 
control of an interactive computer, permitting the tech- 
nical author to control placement of illustrations elec- 
tronically and to view the finished pages immediately 
on a graphic display. This objective has been made 
feasible by the continuing decline in the cost of digital 
storage and processing, and by the advent of inexpen- 
sive all-points-addressable displays and printers which 
are capable of displaying and printing images as well as 
text in various fonts. 

In order to define a background for the architecture of 
the JANUS system, we will introduce three ways of 
classifying document composition systems. These three 
classifications may be thought of as orthogonal axes 
which define a three-dimensional space in which each 
document system is represented by a point (Figure 1). 
It is the objective of the JANUS project to build a sys- 
tem which is interactive, declarative, and capable of 
processing images as well as text. 

Batch vs. Interactive: The first classification distin- 
guishes systems which view document formatting as a 
"batch" job from those which are interactive. Batch- 
oriented systems, such as IBM’s SCRIPT/VS [1], Don- 
ald Knuth’s TEX [2], and Brian Reid’s SCRIBE [3], 
begin at the first page of a document and proceed to 
the last page, transforming an input file of text and 
markup commands into a formatted output file. This 
process takes place without the participation of the 
author, and the effect of a local change in the docu- 
ment can be seen only by reformatting the entire docu- 
ment. On the other hand, interactive systems such as 


82 



Batch Interactive 


Figure 1: Classifications of Document Systems 

Xerox’s ALTO computer with the BRAVO editor [4] 
permit the author to view and edit the formatted docu- 
ment directly, and to see the effects of his changes 
immediately in their local context. Interactive systems 
combine the traditionally separate functions of "editor" 
and "formatter" into a single system so that authors 
can interact with both functions without changing envi- 
ronments. 

Text Only vs. Images and Text: Our second classification 
distinguishes systems which process only text (possibly 
including multiple fonts) from systems which process 
images as well. A full-function system of the latter 
kind will include on-line digital storage for both line art 
(black and white images and graphics) and tone art 
(gray-scale or half-tone images, probably originating 
from a scanner). It should be possible to display either 
of these types of images at the author’s workstation, 
and to print them on the same medium as the text 
(either an all-points-addressable printer of adequate 
resolution or a photocomposer). The advantages of 
on-line storage of images are obvious. The time- 
consuming step of manual pasteup can be avoided, giv- 
ing the author direct control over page layouts with 
immediate feedback. Perhaps even more important, 
when the entire document is in digital form, it can be 
communicated electronically from one place to another; 
it can be archived on magnetic storage media; multiple 
versions can be maintained under computer control; and 
the document can be printed on demand in 
"customized" versions for different users or purposes. 
A commercial system having many of these capabilities 
is the "AIDS" system, marketed by Information Inter- 
national, Inc. [5]. The SCRIBE system at Carnegie- 
Mellon University [3] also has a capability for imbed- 
ding digitized images in documents. 


Procedural vs. Declarative: Our third classification is 
based on the level of the commands by which the user 
communicates with the system, and on the degree of 
understanding which the system has of the structure of 
the document. In a "procedural" system, the author 
controls formatting by low-level commands which direct 
specific actions, such as "skip two lines", "change 
fonts", "indent one inch", etc. In such a system, the 
formatter obeys the author’s commands without any 
understanding of the reason for the commands (e.g., a 
change to italic font might represent an emphasized 
phrase, a book title, a section heading, etc.) In the past 
year or two, a few systems have become available 
which have a greater understanding of the documents 
they are formatting, and therefore can communicate 
with the author at a higher level and provide more help 
in accomplishing his purpose. Examples of such sys- 
tems, which we will refer to as "declarative" systems, 
are IBM’s GML [6] and Brian Reid’s SCRIBE [3]. An 
author communicates with a declarative system by 
marking up his document with "tags" which identify 
parts of the document such as footnotes, chapter head- 
ings, numbered lists, etc. Each "tag" is then interpret- 
ed by the system by means of a possibly complex pro- 
cedure which produces the correct format for the tag- 
ged object. Declarative systems provide authors with 
the benefits of a high-level markup language, in which 
complex formatting procedures can be invoked by sim- 
ple tags; they also provide uniformity of style across 
documents, since, for example, the appearance of a 
footnote is controlled by the system rather than by 
individual authors. Declarative systems also make 
marked-up documents independent of any specific out- 
put device (e.g., a tag for a "book title" might result in 
italics on one output device and in underlining on an- 
other). 

An example document which illustrates the advantages 
of a declarative system is shown in Figure 2. We show 
both the formatted document and the "markup" from 
which it was derived. The document contains a num- 
bered list and a footnote. In a procedural system, the 
author of this document would need to number the list 
items himself, and would control the formatting 
(spacing, indentation, placement on the page, etc.) of 
the list items and footnote by dozens of low-level com- 
mands. In a declarative system the author simply iden- 
tifies the parts of the document by means of tags such 
as the ":item" and "rfootnote" tags illustrated in Figure 
1 (the leading colon is used to distinguish tags from 
text). The difference between a procedural and a dec- 
larative system becomes even more dramatic when we 
consider the process of editing a document. If an au- 
thor needs to add a new item to the top of the list, a 
procedural system would require him manually to ren- 
umber the list items- both in the list and in the foot- 
note. Worse yet, the new item might cause the 
footnote-reference to move forward to a new page, 
causing even more work for the author since the system 
has no understanding of the correlation between a foot- 


83 





The following procedure 
is recommended for 
changing a light bulb: 

1. Remove chandelier, f 

2. Unscrew old light 
bulb. 

3. Screw new light bulb 
in place. 

4. Replace chandelier. 

Care should be taken 
that the light is turned 
off during the installa- 
tion procedure. 


Steps 1 and 4 may be 
omitted if no chandelier 
is installed. 


:porogroph. 

Th« following procedure Is 
r«comm*nd«d for changing a 
lighf bulb: 

:numbsrsd-IIsf. 

:itsm. 

Rsmovs chandsllsr. 

:footnot#. 

Steps 1 and 4 may bs omlttsd 
if no chandellsr is insfallsd. 
:end-footnots. 

.•Item. 

Unscrew old light bulb. 

:item. 

Screw new light bulb In place. 
:item. 

Replace chandelier. 

:end-IIst. 

paragraph. 

Care should be taken that the 
light Is turned off during the 
Installation procedure. 


Formatted document Original markup 


Figure 2: Example of a Marked-up Document 


note and its reference. In a declarative system, these 
problems are automatically solved by the system when 
the author inserts a new list item identified by the 
":item" tag. 

The critical problem facing the designer of a declarative 
document composition system is how to insert into the 
system an understanding of various tags and how they 
should be formatted. It is clear that some interface 
must be provided for users to input new tag-definitions 
or to modify the meanings of existing tags. This tag- 
definition interface may or may not be made available 
to individual authors, according to the editorial policy 
of the organization; however, some useable means must 
be provided whereby types of documents may be de- 
fined, and the various parts of these documents can be 
described and edited. It is our opinion that the useabil- 
ity of the tag-definition interface is critical in the suc- 
cess of any declarative formatting system. Details of 
the JANUS approach to defining tags will be given in a 
later section. 


THE JANUS WORKSTATION 

Choice of a workstation for the JANUS system was 
dictated by the objectives discussed above. In order to 
be interactive, the system must be capable of displaying 


formatted pages; hence the workstation must include an 
all-points-addressable display of adequate size and reso- 
lution for displaying full-size pages of images and text. 
However, in order to be declarative, the system should 
also display the "markup" tags which define the struc- 
ture of the document. It is by means of these high- 
level tags that the author gives the system the informa- 
tion it needs to format the document; therefore the tags 
should be visible and available for editing. One alter- 
native would be to somehow overlay the tags on the 
display of the formatted page, using a different color or 
some other means to distinguish tags from text. Howev- 
er, we feel that this would lead to a confusing display 
and would place unacceptable constraints on the space 
available for display of tags. Therefore, we have adopt- 
ed a "two-screen" workstation, in which the original 
"marked-up" document and the final formatted docu- 
ment are displayed side by side, with the same portion 
of the document visible simultaneously on the two 
screens. As the author edits the text and tags visible on 
the markup" display, he may invoke a command to see 
the effects of his actions on the final document in the 
"formatted page" display. As the author moves from 
one place to another in the markup file, the formatted 
page display tracks his position in the final document. 
The two-display workstation suggested the name for 
our project, which is named after the two-faced Roman 
god Janus. 


84 




i an 


>uld 
ruc- 
igh- 
ma- 
tags 
ter- 
the 
■ or 
/ev- 
day 


Figure 3: JANUS Workstation 

The two-screen workstation selected for use in the TAG DEFINITION LANGUAGE 
JANUS project is the IBM 3277 Graphics Attachment 

[7]. This workstation consists of an IBM 3277 display The problem of formatting pages in a document is simi- 

terminal, which provides a 24-line CRT on which the lar in some ways to the well-known "bin-packing" 

markup file may be displayed and edited, combined problem of packing objects of various sizes into a fixed 

with a Tektronix 618 nineteen-inch direct-view storage space. A page formatter has some flexibility not availa- 

tube, which provides a full-page-size all-points- ble in the bin-packing problem, in that some of the 

addressable screen for display of the formatted docu- objects on the page (e.g., columns of text) can be bro- 

ment. The workstation also provides a joystick which ken across columns or pages (of course, care must be 

can be used for "pointing" to specific points on the taken not to cause a break in the middle of a table or 

storage-tube display, which is necessary for some of the equation or similar block of related material). On the 

interactive commands to be described later. The JAN- other hand, the page formatter must cope with some 


ace 

)pt- 

inal 

cu- 

ion 

r .wo 

on 

see 

the 

om 

ted 

:nt. 

for 

tan 


US workstation is illustrated in Figure 3. 

The initial construction of the JANUS prototype will be 
done on an IBM 370 under the VM/CMS operating 
system. Our source of images will be an ECRM Auto- 
kon 8400 scanner, controlled by an IBM Series/ 1 com- 
puter, which buffers scanned images and forwards them 
to the 370 via a teleprocessing link. The formatted 
documents may be directed to an Autologic APS-5 
photocomposer, adjusted to a resolution of 800 
pels/inch. For quick draft output, documents may also 
be directed to a Versatec raster printer which is capable 
of printing images and text at a resolution of 200 
pels/inch. 


additional constraints: the objects to be placed on the 
page have a pre-defined ordering, and have certain 
requirements for placement, both with respect to the 
page boundaries (e.g., footnotes must appear at the 
bottom), and with respect to each other (e.g., a foot- 
note should appear on the same page as its reference). 
A declarative document system must not only solve the 
page formatting problem, but must provide a language 
which permits authors and editors to specify, by defin- 
ing suitable tags, their own specific solutions to the 
formatting problem. As stated above, we believe that 
design of the language for defining tags is the central 
problem in the implementation of a declarative docu- 
ment system. 


85 



The following is a partial list of the requirements which 

must be satisfied by a tag definition language: 

• The language must be powerful enough to de- 
scribe complex objects such as footnotes, num- 
bered lists, bibliographic references, tables of con- 
tents, etc. 

• The language must provide the editor with flexi- 
bility in defining alternative formats (e.g., a single 
wired-in format for tables of contents would not 
be acceptable). 

• The language should be simple enough to be used 
by technical authors and editors after a reasonable 
amount of training. 

• Wherever possible, the definition of a tag should 
be independent of other tags which may occur in 
the same document. For example, a paragraph 
may occur inside a numbered list, a footnote, an 
abstract, etc.; hence the tag-definition for a para- 
graph should ideally be independent of the envi- 
ronment in which it is used. 

■ Tag definitions should be written in such a way 
that they can accept additional "directions" from 
the author during the formatting process. For 
example, an author may wish to force placement 
of a figure in a specific location (interactive com- 
mands for this purpose are described in the next 
section). In such a case, the tags for the other 
objects on the page should be able to format 
' around" the figure. The mandated placement of 
the figure will be seen by the tags as an additional 
constraint to be dealt with in solving the page for- 
matting problem. 


GALLEYS 

The JANUS project has chosen to attack the page for- 
matting problem in two parts: (1) the placement of text 
and other materials into long "galleys" similar to those 
used in publishing, and (2) the placement of these 
"galleys" onto pages. This paper will include some 
simple examples of language features which might be 
provided to a user for controlling these two processes. 

A 'galley" may be thought of as an ordered sequence 
of "members", as illustrated in Figure 4. Each member 
is an indivisible object ready for placement on a page. 
Typically, a galley member will contain one line of jus- 
tified text. In some cases, a member will contain a 
larger object, such as a table or several lines of text 
which the author requires to be kept together. Other 
possible galley members are figures and chapter head- 
ings. The properties of galley-members are described 
below: 



Figure 4f A Galley 

• HEIGHT and WIDTH: For the time being, we will 
make the simplifying assumption that each galley 
has a natural width (representing the width of the 
columns into which it will be placed), and that all 
members of the galley are either galley-width or 
page-width. 

• SEQUENCE: This property determines the rela- 
tionship between this member and other nearby 
members. The possible values are as follows: 

FIXED: When the galley is placed onto 
pages, this member must be placed in exactly 
the same sequence with respect to its neigh- 
bors as that occurring in the galley. If, for 
example, this member must move forward to 
a new page because there is not enough 
space on the current page, the gap will be 
filled with white space. An example of a 
"fixed" galley member is a chapter heading. 


86 





FLOATING: If necessary, a galley-member 
with SEQUENCE= FLOATING may be 
placed in the document at a later place than 
its indicated position in the galley. If this 
occurs, other galley-members may "fill in" 
behind the member which floated ahead. An 
example of a "floating" galley member is a 
figure. 

• POSITION: This property represents the prefer- 
ance of a galley-member for a particular place- 
ment with respect to the boundaries of the page. 
The possible values are as follows: 

HERE(n): The member should be placed on 
the page as soon as possible. The parameter 
n represents the amount of vertical space re- 
quired for placement of the member. For 
example, a section heading might specify 
HERE(4 inches) to ensure that at least four 
inches are available for beginning the new 
section; otherwise a page break is forced. If 
the parameter n is not specified, the default 
value is the height of the member. 

TOP: The member requires placement at the 
top of the page (example: a running top title 
or page number). 

BOTTOM: The member requires placement 
at the bottom of the page (example: a foot- 
note). 

EDGE: The member requires placement at 
either the top or the bottom of the page (this 
option might be used for figures). 

• CORRELATION: A document may have more 
than one galley flowing through it, and a member 
of one galley may have a correlation (placement 
dependency) with respect to a member of another 
galley. For example, body text and footnotes 
might be formatted in two separate galleys, called 
TGALLEY and FGALLEY respectively. In this 
case, a correlation might exist between the first 
line of the footnote in FGALLEY and the line 
containing the footnote reference in TGALLEY. 
The correlation would indicate that the 
FGALLEY-member should be placed on the same 
(or in any case, not on an earlier) page than the 
TGALLEY -member. 

The second step in the formatting process is the place- 
ment of the galley(s) onto actual pages. This process is 
controlled by a user-specified "page template" which 
contains boxes into which the galleys are to be placed. 
A given page template may receive one or more galleys. 
The ordered sequence of boxes on a page-template 
which is intended to receive a particular galley is called 


a "galley-bed." A simple example of a page template 
containing two galley-beds is shown in Appendix A. 

Like the members of a galley, the boxes on a page- 
template have user-specified properties which control 
the way in which galleys are broken into pieces and 
placed on the page. The following is a summary of 
these properties: 

• POSITION and SIZE: The position of each box is 
specified with respect to the upper left-hand cor- 
ner of its "parent" box (the box in which it is 
contained). The parent box may be the page it- 
self, or a box which represents a subset of the 
page. The size of a box, and its position within its 
parent, are determined by specifying two of the 
three parameters (top, bottom, and height), and 
two of the three parameters (left side, right side, 
and width). The position specified is the initial 
position of the box when the page is newly creat- 
ed. As the page fills with text and other materials, 
the box may move about, as specified below. By 
convention, if a "parent" box moves, all the boxes 
contained in it move also. The height and width 
specified for a box are also interpreted as repre- 
senting the initial size of the box. As the page 
fills, boxes may grow and shrink, as described be- 
low. 

• GALLEY-BED: An important property of each 
box is its membership in a particular galley-bed, 
and its sequence with respect to other boxes in the 
galley-bed. This applies especially with multi- 
column formatting. 

• GROWTH: This property specifies what happens 
when galley-members are placed into the box, and 
it becomes full. The options are as follows: 

NONE: The box retains its original size. Ad- 
ditional galley-members must be placed in 
the next box in the galley-bed. If no more 
boxes are available on the current page, a 
new page is automatically generated. 

UP: The box grows in an upward direction to 
make room for new material, "pushing" on 
other boxes until it meets a rigid barrier 
which prevents further growth. Such a barri- 
er might be the boundary of the page, or a 
box which refuses to be "pushed." For ex- 
ample, a suitable galley-bed for a footnote 
galley might be an upward-growing box of 
initial height zero. 

DOWN: This option is similar to UP, except 
that the box grows in a downward direction 
until a barrier is reached. 


87 



• RETREAT: This property specifies what happens 
when another box, in an attempt to grow, 
"pushes" on this box. The options are as follows: 

NONE: This box refuses to be pushed. The 
pushing box is prevented from further 
growth. 

SHRINK: This box shrinks on whatever side 
is being pushed. The shrinking may cause 
galley members previously placed in the 
shrinking box to flow into the next box in 
the same galley-bed. This process of galley 
members flowing forward propagates through 
the galley-bed and may trigger the creation 
of a new page. 

SLIDE: This box retains its current size, but 
slides upward or downward within its parent 
box in order to make more room for the box 
which is growing. As it slides, this box may 
"push" on other boxes until a rigid barrier is 
met. 

In order to specify the formatting for a new type of 
document, an author or editor must do two things: (1) 
Define the page-templates to be used in the document 
(the document may contain several different types of 
pages); and (2) Define each tag to be used in the docu- 
ment. The syntax to be used for these definitions is not 
yet completely designed, but a possible example of a 
simple page-template and three tag-definitions is given 
in Appendices A and B. The tag-definitions are written 
in a PL/I-like language with commands to create 
galley-members and fill them with text and other mate- 
rial. The FILL command in Appendix B may be used 
to fill either a galley or a galley-member, performing 
hyphenation and justification automatically. If FILL is 
used with the name of a galley, it produces one galley- 
member for each line of formatted text. If FILL is 
used with the name of a galley-member, it fills the indi- 
cated member with formatted lines. The source of 
material for the FILL command may be any of the 
following: 

• INPUT: Text taken from the user’s original mar- 
kup file, from the current position until the next 
tag is encountered. 

• STRING <expression>: A character string com- 
puted by the tag-definition. 

• COMMANDS: The tag-definition may issue com- 
mands which control the hyphenation and justifi- 
cation program which is filling the galley. These 
commands do not create galley members of their 
own. However, they may insert white space into 
the galley, specify a change of fonts, specify right, 
left, or center quadding, etc. 


Once the tags and page-templates have been defined 
for a given type of document by an experienced editor, 
less sophisticated users can create and edit individual 
documents without having a detailed understanding of 
how the tags are defined. When the interactive JANUS 
formatter is invoked on a given document, it scans the 
"markup" file looking for tags. Whenever a tag is en- 
countered, its tag-definition is invoked, which causes 
galley-members to be created and filled with material. 
As the galley-members are created, they are placed 
automatically into galley-beds as specified by the cur- 
rent page template (which is chosen by the tag- 
definition). When page-width objects are encountered 
in the galley, the columns of the galley-bed are 
"balanced" above the page-width object, and normal 
placement of galley-members resumes below the wide 
object. Whenever a galley-bed becomes full and is 
unable to grow, a new page is created. The formatter 
saves its state in a side-file at each page boundary. 
This eliminates unnecessary reformatting as the user 
moves around the markup file. 

One problem with the use of galleys is where text needs 
to be flowed around a figure. Until the galley is packed 
on the page, it is not clear which part of the galley 
needs to be narrowed. We need some feedback across 
the galley interface. 


INTERACTIVE PAGE LAYOUT 

Ordinarily, we expect that the information contained in 
the tag-definitions and the page-templates will enable 
the formatter to do an acceptable job of page layout. 


FIG. 1 

TGALLEY 

TGALLEY 

FIG. 2 


Figure 5: Layout Mode Display 


88 




needs 

acked 

galley 

icross 


s 

t 


r i 

Text 

Editor 

> 

Markup 

File 

J 

/ 





r > 

ECRM 

Scanner 


Image 

Files 

) 


> 

£ 


Formatter 

5 






Formatted 

Paces 


» To 
Printers 


"d in 
le 
^out. 


Figure 6: Components of the JANUS System 


However, for very complex documents, the user may 
occasionally wish to overrule the system’s decisions and 
take direct control of the format of a page— e.g., specif- 
ying a particular placement for a figure. JANUS will 
permit users this degree of direct control by means of a 
feature called "layout mode." In layout mode, the 
output display shows a graphic representation of the 
galley-beds on the current page, together with the cur- 
rent position of all objects with the property 
SEQUENCE=FLOATING. An example of a layout 
mode display is shown in Figure 5. The user may repo- 
sition the floating objects on the page by pointing to 
them with a joystick-controlled cursor and issuing 
graphic commands. The following is a partial list of 
layout mode commands, which are invoked by single 
keystrokes in conjunction with a cursor position: 


I: (IDENTIFY) The indicated object becomes atta- 

ched to the cursor and may be moved to a new 
position on the page (applies only to floating ob- 
jects). 

M: (MOVE) The object attached to the cursor be- 
comes detached and remains at its current posi- 
tion. 


D: (DROP) The indicated object is dropped from the 
current page, and pushed forward onto a subse- 
quent page (applies only to floating objects). 

V: . (VERTICAL SPLIT) The indicated box on the 
page is split into two boxes by a vertical line at 
the indicated position. 

H: (HORIZONTAL SPLIT) The indicated box on the 
page is split into two boxes by a horizontal line at 
the indicated position. 

C: (COMBINE) The indicated two boxes are com- 

bined into one (cursor must be placed on the 
boundary between two boxes of compatible 
types). 

W: (WHITE SPACE) The indicated box is forced to 
contain white space. 

When the user leaves layout mode, the interactive for- 
matter is invoked to format the galleys around the new 
positions of floating objects and white space specified 
by the user, and the newly formatted page is displayed. 
The user may toggle back and forth between layout 
mode and interactive editing mode as many times as 


89 








r 


necessary until the page is satisfactory. Of course, 
changes made to the format of a given page may affect 
the format of subsequent pages, so the user is advised 
to proceed through the document from front to back 
when making final adjustments to page layouts. Wher- 
ever possible, changes made during a LAYOUT session 
will be applied after small changes to the source. The 
interaction between layout mode and the rest of the 
JANUS editor/formatter system is illustrated in Figure 
6 . 


SUMMARY 

We have discussed the architecture of a document com- 
position system which offers the following principal 
features: 

• It is highly interactive, providing authors with 
immediate feedback by means of an all-points- 
addressable display. 

• It is capable of formatting complex documents 
containing mixtures of text, line art, and images. 

• It allows users to mark up their documents with 
high-level descriptive tags. 

• It provides a useable interface for defining the 
meanings of tags and specifying how various ob- 
jects are to be placed on pages. 

• It provides a two-screen workstation in which the 
user can see both his original markup and the re- 
sulting formatted pages simultaneously. 

• It provides a set of graphic commands by means 
of which the user can take direct control over page 
layout when necessary. 

At present, the hardware and software tools necessary 
for construction of the JANUS prototype are in place. 
We are able to scan both black-and-white and gray- 
scale images, to display them on the 3277GA worksta- 
tion, and to print them on the Versatec printer and the 
APS-5 photocomposer. A preliminary version of the 
graphic command interpreter for layout mode has been 
implemented. However, the syntax of the JANUS lan- 


guages for document markup and tag definitions have 
not yet been fully specified, and implementation of 
these languages has not yet begun. In addition, there 
are several important aspects of a complete document 
system which our project intends to address, but which 
are not described in this paper. These include a graphic 
editor for line art, support for tables and equations, and 
interaction between the JANUS formatter and a data- 
base system which will permit the imbedding of material 
such as bibliographic references and computer- 
generated data. 

REFERENCES 


[1] Document Composition Facility: User’s Guide. 
IBM Publication No. SH20-9161 (April 
1980). 

[2] D. E. Knuth. TEX: A System for Technical 
Text. American Mathematical Society, Provi- 
dence, R.I., 1979. 

[3] B. K. Reid. "A High-Level Approach to Com- 
puter Document Formatting." Conference Re- 
cord of the Seventh Annual ACM Symposium on 
Principles of Programming Languages, Las Ve- 
gas, NV (January 1980), pp. 24-31. 

[4] C. P. Thacker, E. M. McCreight, B. W. 
Lampson, R. F. Sproull, and D. R. Boggs. 
Alto: A Personal Computer. Technical Report 
CSL-79-11, Xerox Palo Alto Research Center 
(August 1979). 

[5] The Seybold Report, Vol. 8, No. 13 (March, 

1979) . Seybold Publications, Inc., Media, PA. 

[6] Document Composition Facility, Generalized 
Markup Language: Concepts and Design Guide. 
IBM Publication No. SH20-9188 (April 

1980) . 

[7] IBM 3277 Display Station, Graphics Attach- 
ment RPQ 7H0284, Custom Feature Descrip- 
tion. IBM Publication No. GA33-3039 (July 
1979). 


90 



1 of 
here 
lent 
hich 
phic 
and 
ata- 
i rial 
ter- 


lide. 

\pril 


nical 

rovi- 


,om- 
Re- 
m on 
Ve- 


W. 


jit 

enter 


(arch, 

PA. 


ilized 
i 'uide . 
April 


ttach- 

rscrip- 

(July 


APPENDIX A: A PAGE TEMPLATE 


TWOCOL: PAGETYPE ( PAGEHEIGHT , PAGEWIDTH, 

MARGIN, GUTTER); 

OCL (PAGEHEIGHT, PAGEWIDTH, GUTTER, 

MARGIN, COLWIDTH) REAL GLOBAL; 

DCL (P, Cl, C2, FOOT) BOX; 

OCL (TGALLEY, EGALLEY) GALLEY; 

COLWIDTH = (PAGEWIOTH - 2*MARGIN - GUTTER) / 2; 

P = BOX(TYPE=PAGE, 

HEIGHT=PAGEHEIGHT, WIDTH=PAGEWIDTH) ; 

/* P is the page */ 

FOOT = B0X(GR0WTH=UP, RETREAT=NONE , PARENT=P , 
B0TT0M=HE IGHT(P) -MARGIN , HE IGHT=0 , 
LEFT=MARGIN, RIGHT=WIDTH(P) -MARGIN, 
GALLEY=FGALLEY) ; 

/* To be used by footnotes */ 

Cl = BOX(GROWTH=NONE , RETREAT=SHRINK, PARENT=P , 
TOP=MARGIN , B0TT0M=T0P ( FOOT ) , 
LEFT=MARGIN , WIDTH=COLWIDTH , 
GALLEY=TGALLEY ) ; 

C2 = BOX(GROWTH=NONE, RETREAT=SHRINK, PARENT=P, 
T0P=MARG I N , B0TT0M=T0P ( FOOT ) , 
RIGHT=WIDTH(P) -MARGIN, WIDTH=COLWIDTH , 
GALLEY=TGALLEY ; 

/* Cl and C2 are the columns */ 

END TWOCOL; 


APPENDIX B: TAG DEFINITIONS 


SECTION: TAG(SECNAME) ; 

/* Page-width section heading */ 

H = MEMBER(GALLEY=TGALLEY , 

WIDTH=PAGEWIDTH-2*MARGIN, 
SEQUENCE=FIXED,POSITI 0N=HERE (31) ) ; 

FILL H WITH COMMAND 'space 1 inch'; 

FILL H WITH COMMAND ' font=press-bold( 18 point)'; 
FILL H WITH COMMAND 'quad center'; 

FILL H WITH STRING SECNAME; 

FILL H WITH COMMAND 'restore normal justification'; 
FILL H WITH COMMAND 'restore previous font'; 

FILL H WITH COMMAND 'space 0.5 inch'; 

CLOSE H; /* The galley-member is closed 
and added to the galley */ 

END SECTION; 


FIG: TAG(FIGNAME) ; 

/* A figure with the given name, 
to be retrieved from a library */ 

DCL FIGNO INTEGER GLOBAL INIT(O); 

/* Current Figure No. */ 

DCL (W, WW) REAL; 

FIGNO = FIGNO + 1; 

W = FIGWIDTHt FIGNAME ) ; 

/* Function examines the figure 
and returns its width */ 

IF W<=COLWIDTH 
THEN WW=COLWIDTH; 

ELSE IF W<=PAGEWIDTH-2*MARGIN 

THEN WW=PAGEWIDTH-2*MARGIN ; 

ELSE ERRORt 'Figure ' 1 1 FIGNAME 1 1 

1 too wide for page' ) ; 

F = MEMBER(GALLEY=TGALLEY , SEQUENCE^ FLOATING, 
P0SITI0N=EDGE , WIDTH=WW); 

FILL F WITH COMMAND 'space 0.2 inch'; 

FILL F WITH IMAGE FIGNAME (CENTERED); 

FILL F WITH COMMAND 'space 0.2 inch'; 

FILL F WITH COMMAND 'quad center'; 

FILL F WITH STRING ('Fig. ' 1 1 FIGNO II': ' I I FIGNAME) ; 
FILL F WITH COMMAND 'restore normal justification'; 
FILL F WITH COMMAND 'space 0.2 inch'; 

CLOSE F; /* The galley-member is closed 
and added to the galley */ 

END FIG; 


FOOTNOTE: TAG; 

/* Generates footnote reference 
as well as footnote itself */ 

DCL FNN INTEGER GLOBAL INIT(O); 

/* Current footnote no. */ 

FNN = FNN + 1; 

FILL TGALLEY WITH COMMAND 'font=superscript' ; 

FILL TGALLEY WITH STRING FNN; 

FILL TGALLEY WITH COMMAND 'restore previous font ; 

TIE FGALLEY TO TGALLEY; 

/* Current point in FGALLEY must remain on same 
or later page as current point in TGALLEY */ 

FILL FGALLEY WITH COMMAND 'skip 1 line'; 

FILL FGALLEY WITH COMMAND 'font=superscript' ; 

FILL FGALLEY WITH STRING FNN; 

FILL FGALLEY WITH COMMAND 'restore previous font'; 
FILL FGALLEY WITH INPUT; 

/* FOOT grows upward as it fills. 

If it runs into something, 
a new page with a new FOOT is created, 
and filling continues. */ 

END FOOTNOTE; 


91 



PIC — A Language for Typesetting Graphics 

Brian W. Kernighan 

Bell Laboratories 
Murray Hill, New Jersey 07974 

ABSTRACT 

PIC is a language for specifying pictures so that they can be typeset as an integral part of a 
document preparation system. 

The basic objects in PIC are boxes, lines, arrows, circles, ellipses, arcs and splines, which 
may be placed anywhere and labeled with arbitrary text. Most of the effort in designing PIC has 
gone into making it possible to specify the sizes and positions of objects with minimal use of 
absolute coordinates. 

This paper describes PIC, with primary emphasis on those aspects of the language that 
make it easy to use. 


1. Introduction 

The Unix® operating system provides a family of pro- 
grams for document preparation. The basic tool is 
TROFF, a formatter written originally by J. F. 
Ossanna. 1 TROFF is comparable in capabilities (and to 
some extent style) with commercial formatters like 
PAGE-2, 2 - 3 and somewhat more powerful than, but 
with a very different flavor from, a formatter like 
TrX . 4 

Since TROFF is unbelievably hard to use in its 
naked form, it has been concealed by a variety of spe- 
cial languages. For instance, a set of standard, com- 
patible macro packages allows users to specify docu- 
ments in terms of logical divisions like paragraphs, 
sub-headings, and displays, and thus not really need 
to know any of TROFF' s intricacies. This approach is 
used in many other macro-based formatters, and is an 
integral part of Scribe. 5 

Mathematical material is handled by a separate 
program called EQN, 6 which translates a language 
specifically designed for mathematical expressions into 
TROFF commands. For example, the input 

Permission to copy without fee all or part of this material is 
granted provided that the copies are not made or distributed 
for direct commercial advantage, the ACM copyright notice and 
the title of the publication and its date appear, and notice is 
given that copying is by permission of the Association for Com- 
puting Machinery. To copy otherwise, or to republish, requires 
a fee and/or specific permission. 

Author's present address: Room 2C-518, Bell Laboratories, Mur- 
ray Hill, New Jersey 07974. 

0 1981 ACM 0-89791-043-5/ 8 1/0600/0092 $00.75 


lim from {x -> pi /2} 

( tan'x ) sup {sin~2x} ~=~ 1 

produces 

lim (tan x) sln2i = 1 

r-1r/2 

EQN operates as a preprocessor; it translates 
mathematical parts of the text, while copying the rest 
from input to output untouched. In a Unix environ- 
ment, EQN and TROFF usually run together in a 
pipeline, where the output of EQN passes directly to 
the input of TROFF. 

eqn file I troff 

The success of EQN has led to languages and 
preprocessors for other aspects of document prepara- 
tion. TBL 7 provides a language for formatting tabular 
material; REFER 8 expands concise bibliographic cita- 
tions into precise references properly formatted. TBL 
and REFER are also TROFF preprocessors, so a typical 
document is formatted by passing its text through a 
series of preprocessors, each of which processes that 
part which belongs to it while leaving the rest alone: 

refer file ... I tbl I eqn I troff 

The Unix document preparation software is described 
more fully in [9]. 

A glaring omission from this set of programs is 
any facility for drawing figures. Consider drawing a 
picture of the flow of information in a document 
preparation pipeline: 


92 


ites 

rest 

on- 

j 


and 
ara- 
ular 
:ita- 
TBL 
ical 
,h a 
that 
e: 


bed 

is is 
ig a 
rent 



macros 

This paper describes a language called PIC that 
provides facilities for drawing such figures as an 
integral part of the standard Unix document prepara- 
tion package. All of the figures in this paper have 
been drawn with PIC. 

The paper on IDEAL by Van Wyk 10 describes 
another language for graphics typesetting that 
operates in the same environment, though the 
language itself is very different from PIC. 

2. Design Considerations 

It is all too common that figure preparation is 
essentially independent of the rest of preparing a 
document: figures are sketched, drafted (often several 
times until they are "right"), then glued onto the final 
manuscript. Any change requires a substantial effort. 
In fact, the whole process is so laborious that authors 
shy away from using figures merely because of the 
difficulty of producing them. 

The purpose of PIC is to eliminate this painful 
manual effort. It is mandatory that whatever the 
language, it fit naturally and comfortably into the rest 
of the system. Making sure that a figure fits entirely 
on a page or is printed in a particular size or font 
must be done as part of the normal document 
preparation process, not by cut and paste. 

Furthermore, figures themselves contain text, 
and this text ought to have access to the full capabili- 
ties of whatever software and hardware is available to 
ordinary text. This generally means that it is not ade- 
quate to simulate cut and paste electronically by 
inserting pictorial material created elsewhere (for 
example, as in the ©picture command of Scribe.) 

I also decided very early that users would create 
pictures by typing ordinary ASCII text at a standard 
terminal, not by sketching them on a graphics device. 
This decision was in part forced by the fact that there 
are very few decent graphics input devices available 
(at least to me), while terminals are ubiquitous. This 
decision is analogous to one made for EQN, where 
symbols like a are typed with a spelled-out equivalent 


like alpha rather than entered from a special key- 
board. I shall have more to say about this aspect of 
design later on. 

Of course a fundamental rule was that PIC had 
to cooperate with the existing Unix document 
preparation tools, so that, for example, one could 
draw pictures with equations in them: 



H(u>) 



l-H(io) 



Thus PIC is yet another TROFF preprocessor. 

The remaining goals, insofar as they were ever 
articulated, were the normal ones of having a 
language that is easy to learn and use. This translates 
into a couple of more specific considerations. 

First, many graphics languages lean heavily on 
the use of absolute Cartesian coordinates for specify- 
ing the positions and sizes of objects. (See [11], for 
example.) Absolute coordinates are sufficient, in that 
anything can be expressed in such terms, but they 
make it hard both to create the first draft of a picture, 
and to modify it subsequently. PIC tries to provide 
alternate methods, so that a typical picture requires 
few, if any, absolute coordinates. 

Second, I have tried to arrange sensible defaults 
for everything in sight, so that at least simple pictures 
can be made with a minimal amount of specification. 

I will try to illustrate how these goals are 
addressed during the discussion of the PIC language. 

3. The PIC Language 

PIC provides as primitives boxes, lines, arrows, 
motions, circles, ellipses; arcs and B-splines (most of 
which are illustrated above) plus facilities for position- 
ing and labeling them. A picture is written as a 
sequence of statements, one per line or separated by 
semicolons. Each statement requests a primitive, 
perhaps qualified by a set of attributes such as posi- 
tion, text, height, width, etc. 

The picture specification is introduced by a line 
beginning ".PS" and ends with the line ".PE.' Any- 
thing between these is a picture description to be con- 
verted into TROFF comments; anything outside is 
copied from input to output untouched. Thus, as a 
simple example, 

.PS 

box "this is" "a box" 

■ PE 

produces the archetypal box: 


this is 
a box 


93 





The .PS and .PE lines are also copied to the output, 
so they can be used by a macro package to control 
things like centering a picture or making it all fit on a 
page. 

A primitive may be followed by any number of 
quoted text strings; these will be printed in the 
current size and font, centered vertically and horizon- 
tally at the center of the object. (Text position, size 
and font can be controlled more precisely if needed, 
but this default is right for the majority of situations.) 
Technically, text strings are attributes in exactly the 
same way that height and width are. 

Since the contents of the strings are not exam- 
ined by PIC, they may contain size or font control 
information to be passed on to TROFF; thus it is easy 
to get non-standard sizes and fonts: 



The proper mental model for writing a PIC pic- 
ture is to imagine oneself walking around on a plane 
dropping primitives. The plane has a Cartesian coor- 
dinate system with dimensions most easily expressed 
simply in inches. The last-resort mode of operation is 
to specify absolute size and position for each object 
explicitly, as in 

box height 0.3i width 0.5i at 0.25i,0.15i 
line from 0.25i, 0.15i to 0.75i, 0.15i 


If a number is followed by an "i" it signifies a dimen- 
sion in inches. Thus the pair 0.25i,0.15i 
represents an x,y coordinate measured in inches, not 
a complex number. 

As I have suggested several times, this is not at 
all convenient or flexible. The major tactics for avoid- 
ing such specifications are default dimensions, 
automatic positioning, and naming conventions for 
positions. 

Default dimensions 

Primitives have default dimensions, chosen so 
that instances of them fit together without too much 
effort. For example, the box is 14" by as is the 
ellipse; the circle and arc have a 14" radius, and the 
standard line and motion are 14 inch. 

The attributes up, down, left and right are 
applied to line or move to specify a direction, a dis- 
tance, or both. For example, 

line up right 

draws a line upwards and to the right: 



The distance is defaulted here, but could be given 
explicitly, as in "line up 0.5i right 0.5i." 

Lines may also be paths: 

line up right then down right 

then down left then up left 

draws a diamond. 



The word "then" separates components of a path. 

Default dimensions are controlled by variables 
that may be set at any time to any value, by assign- 
ment statements like 

boxht = 0.3i; boxwid = 2 * boxht 
Default motions 

Objects are connected by default in the direction 
of motion, e.g., left to right, so the input 

arrow; box; arrow; circle; arrow 
produces 



The direction of motion can be changed by the explicit 
commands up, down, left, or right, and it also 
changes automatically when a line or arrow or arc 
implies a change, as in 

arrow; arc; arc cw; arrow 



The attribute cw means clockwise; default arcs are 
counter-clockwise. 

Labels and Comers 

Pictures do not always grow left to right, so 
users need the ability to refer to parts of a picture 
drawn previously. 

First, any object can be labeled, as in 
Start : box . . . 


94 




ven 


bles 

ign- 


tion 


elicit 

also 

arc 


are 


, so 
:ture 


This places a label Start on this box; subsequent 
commands may refer to Start, as in 

move to Start 

The label Start actually refers to the center of 
the object. To access a side or corner without using 
the coordinates of the center and arithmetic on the 
dimensions, PIC provides "corners," that is, standard 
named points on each object. Most objects have the 
eight compass point corners (north, northeast, 
etc.), top, bottom, left and right as synonyms for 
north, etc., and start, end and center. Thus one 
can say 

Bl: box; arrow; box; arrow 

B3 : box 

arc -> cw from top of Bl to top of B3 



-> asks for an arrowhead. If the radius of an arc is 
not specified, a "large enough" radius will be sup- 
plied. 

Long phrases like "top of B" can also be writ- 
ten as B. top or even B.t. Expressions involving 
corners are also legal, as in 
top of B + (0.5i, O.li) 


Ordinal names 

Since it is clumsy to have to invent labels for 
things, objects may be referred to by their ordinal 
position in the input. The example above can be writ- 
ten instead as 

box; arrow; box; arrow; box 

arc -> cw from top of 1st box \ 
to top of last box 

All kinds of combinations like 1st, 2nd, 3rd last, 
and so on are recognized. (The 'V continues a long 
input line.) 

Finally, it is possible to request that some corner 
of some object be placed at a specific point, so that 
arithmetic is not necessary: 

ellipse "l" 

ellipse "2" with .nw at last ellipse.se 
produces 



4. Refinements and Embellishments 

Besides the facilities illustrated in the previous 
section, PIC provides a number of other services. 
This section gives a quick look at some of them. 

Although the dimensions in a picture are most 
easily given in inches, the resulting picture may well 
turn out too big or too small, particularly if the pic- 
ture has been moved from one document to another. 
Accordingly, if the .PS command is followed by a 
width, the picture is forced to be that wide, with the 
height adjusted to maintain the same aspect ratio. In 
this case, dimensions within the picture are in effect 
treated as scale-independent coordinates. 

There are a variety of attributes not mentioned 
earlier. Lines and boxes may be dashed or dotted. 
Lines may be chopped by a specified amount at either 
end; this is a convenient way to arrange that a line 
meets a circle properly at any angle without using tri- 
gonometric functions, as in a tree like this: 



The input which specifies this tree illustrates the PIC 
macroprocessor, as well as the chop attribute. 

circlerad = 0.15i; d = 0.5i; s = 0.3i 
define tree ' 

{ R: $1; move to R 
{ line from R to R - $4,d chop 

line from R to R + $4,-d chop } 

{ move left $4 down d - circlerad; $2 } 

{ move right $4 down d - circlerad; $3 } 

> 

tree (circle circle "Sop sub 1$", 

tree(circle , circle "Sop sub 2$", 
circle "Sop sub 3$", s) , s) 

Mathematical expressions are enclosed in $ signs for 
EQN; these are not processed by PIC in any way. 

Any sequence of statements in braces is pro- 
cessed as a group; the position upon completion is 
returned to where it was when the group was begun. 

Any object can be invisible, which is often a con- 
venient way to force positioning without drawing 
anything. For example, the same picture with invisi- 
ble circles looks like this: 


This facility is especially useful for positioning boxes; 
the second figure in section 5 makes heavy use of this 
capability. 


95 






/ \ 

op, * 

/ \ 

opi opi 

Splines are actually B-splines. The spline is 
specified by a set of "guiding points" that are not 
actually on the curve (except for the ends), but control 
the position of the curve, as illustrated below: 



The spline is tangent to the midpoint of each segment 
of the path. This picture was created with the input 

line right li down 0.25i \ 
then left li down 0.25i \ 
then right li down 0.25i dashed 
spline from start of 1st line \ 
right li down 0.25i \ 
then left li down 0.25i \ 
then right li down 0.25i 

Arcs may be specified in a variety of ways. By 
default, "arc" in isolation requests a 90° counter- 
clockwise arc. It is also possible to specify the center, 
the radius, the start, and the end of the arc in arbi- 
trary combinations. PIC will fill in missing values. 
For instance, only the start and end were given for 
the arc in the previous section; the radius was com- 
puted from the end points. 

It is possible to put TROFF commands into any 
picture either in text strings or as separate pseudo- 
commands. So long as they make sense in the con- 
text where they occur, they can be used with impun- 
ity. They are most often useful for setting sizes and 
fonts globally. 

5. Experience and Observations 

PIC has been in use since early in 1980. At this 
time it has been used by at least 25 different people, 
mostly technical but with a smattering of secretaries. 
The pictures produced have ranged from simple ones 
like those in this paper through to some truly awe- 
some examples. This illustrates a chained-overflow 
hash table: 



This one illustrates the proof of a theorem in 
bin-packing; it makes heavy use of the "with" nota- 
tion. 


nil 

N 

Z2 

N 

N 

N 

N 

3 


BL 

BL 


BL 


Here is another, from a paper on denotational 
semantics. 


96 














No uniform style of use has been observed. 
Some users rough out a sketch on graph paper (so 
much for my brave words about absolute coordi- 
nates!), while others use mostly relative positioning 
but a fair amount of adjustment by fractions of 
inches. 

The language is not as easy to use as I had 
hoped, although I rationalize that specifying pictures 
really is hard. There are some interactions between 
automatic positioning and the setting of directions 
that have to be ironed out. Users still need more 
labels and coordinates than I would like. Neverthe- 
less, several users have commented that while it isn't 
always much easier to get a first draft of a picture 
done, it is vastly easier to manage things thereafter. 
The picture is always present, always right, and rela- 
tively easy to modify as new ideas occur to the 
author. Parts of a picture can be re-used. 

One use of PIC that I had not thought of in the 
original design surfaced very quickly: there are now 
other programs that produce PIC as their output 
language. One of these is an interactive editor: a pic- 
ture is sketched on a tablet, then converted into PIC 
for subsequent processing. Another is a simple pro- 
gram that makes PIC commands to draw histograms 
from sets of numbers. Using another language to 
create PIC commands has much to commend it. 
Loops, conditions and complicated arithmetic are all 
handled nicely elsewhere, so PIC itself need not pro- 
vide them. 

The question of whether it is better to have an 
interactive system or an ASCII-only scheme like PIC is 
hard to answer in general. The local computing 
milieu has a large effect — we have poor facilities for 
interactive input, but almost everyone has a terminal 
in his/her office and home. The advantage of a linear 
text representation is that it is well suited to machine 
manipulation. The program to create PIC commands 
for histograms would be hard to imagine in a purely 
interactive environment. The textual representation 
has information in it about the relationships of objects 


that is often lost (or not collected at all) in an interac- 
tive system. For example, if a box is stored as four 
line segments, it is no longer a box as far as the sys- 
tem is concerned. 

There are several ideas for extending PIC to 
address some of its deficiencies. One trivial thing 
would be to add some kind of "constraint" specifica- 
tion, so lines that are just horizontal and vertical 
paths could be specified as 

line right enough 

then down enough to get to ... 

The limited form of block structure with braces 
should be enhanced, for example to make the coordi- 
nates and perhaps the names within it local to that 
block. More generally, PIC could profit from a pro- 
cedure mechanism. This would provide local scope 
for names and coordinates, and a better way to create 
repetitive structures. The "comer" mechanism could 
be generalized here as well, to permit users to define 
the corners on objects as needed. Furthermore, it 
would obviate the pressure for more and more primi- 
tives ("Why isn't diamond a primitive like box?"). 

Acknowledgments 

I am grateful to Chris Van Wyk for perceptive 
comments on this paper and for the circle and line 
drawing algorithms used in PIC; to Theo Pavlidis, 
who developed the basic spline algorithm; and to 
Brenda Baker and Ravi Sethi for the large examples of 
Section 5. 

References 

1. J. F. Ossanna, "Nroff/Troff User's Manual," 
UNIX Programmer's Manual 2, Section 22 (January 
1979). 

2. John Pierson, Computer Composition Using PAGE- 
1, Wiley-Interscience, 1972. 

3. Michael P. Barnett and Barbara H. Barnett, 
"Computer Graphics and Electronic Publishing," 
Proc. 1980 Harvard Computer Graphics Symposium. 

4. Donald E. Knuth, "TAU EPSILON CHI: A Sys- 
tem for Technical Text," STAN-CS-78-675.1, 
Computer Science Department, Stanford Univer- 
sity, Stanford, California (November 1978). 

5. Brian K. Reid, "Scribe: a high-level approach to 
computer document formatting," 7th Symposium 
on the Principles of Programming Languages, Las 
Vegas (January 1980). 

6. Brian W. Kernighan and Lorinda L. Cherry, "A 
System for Typesetting Mathematics," Communi- 
cations of the ACM 18(3), pp. 151-157 (1975). 

7. M. E. Lesk, "Tbl — A Program to Format 
Tables," UNIX Programmer's Manual 2, Section 10 
(January 1979). 

8. M. E. Lesk, "Some Applications of Inverted 
97 



Indexes on the UNIX System/' UNIX 
Programmer's Manual 2, Section 11 (January 1979). 

9. B. W. Kemighan, M. E. Lesk, and J. F. Ossanna, 
“Unix Time-Sharing System: Document Prepara- 
tion," Bell Sys. Tech. J. 57(6), pp. 2115-2135 
(1978). 

10. C. J. Van Wyk, "A Graphics Typesetting 
Language," SICPLAN Symposium on Text Manipu- 
lation, Portland (June, 1981). 

11. John C. Beatty, "PICTURE — A picture-drawing 
language for the Trix/Red Report Editor," 
Lawrence Livermore Laboratory Report UCID- 
30156 (April 1977). 



98 



A Graphics Typesetting Language 

Christopher J. Van Wyk 

Bell Laboratories 
Murray Hill, New Jersey 07974 


ABSTRACT 

We present a new programming language, ideal, in which two-dimensional figures can be 
expressed. The language is intended to work with existing text-formatting systems so that text 
and figures can be typeset at the same time. 

The building block for ideal programs is called a "box"; it shares some features with pro- 
cedures and some with records in general-purpose programming languages. A box includes a 
system of constraints (in this incarnation, equations in complex variables) that declares the rela- 
tive positions of its significant points and requests for actions to be performed at those points. 

A box is called by adding enough constraints to the system in its definition that its signifi- 
cant points can be solved for uniquely. A box call may also include additional actions to be per- 
formed during the call. The notions of drawing a line using a pattern and texturing a polygonal 
area follow directly from the mechanism for defining and calling boxes. 

Users may also ask for a box to be "opaque" so that it blots out pieces of picture that it 
covers. Finally, two commands embody the idea of sketching several pictures on different paral- 
lel planes, then merging them into a single picture. We use these when erasing to create one 
part of a picture would destroy another part that we want. 

We discuss good algorithms for implementing ideal when the constraints are expressed as 
a special kind of nonlinear system and the pictures are composed of straight lines and circular 
arcs. The language has been implemented, and was used to produce the paper. 


Introduction 

Suppose you're using a document preparation system 
to prepare a paper and you want to include figures. 
Even if you have both a high-resolution typesetter 
and a quality graphics output device, you still wind 
up cutting and pasting the figures into the document 
for every draft and the final paper. This job can be 
especially arduous if your figures include text: then 
you get to cut and paste a little piece of paper onto 
the picture for each caption. 

This paper presents IDEAL, a language that 
allows you to describe your figure, including text, so 


Permission to copy without fee all or part of this materi- 
al is granted provided that the copies are not made or distribut- 
ed for direct commercial advantage, the ACM copyright notice 
and the title of the publication and its date appear, and notice 
is given that copying is by permission of the Association for 
Computing Machinery. To copy otherwise, or to republish, re- 
quires a fee and/or specific permission. 

Author's present address: Christopher J. Van Wyk, 
Computing Science Research Center, Bell Laboratories, Murray 
Hill, NJ 07974 

® 1981 ACM 0-89791-043-5/81/0600/0099 S00.75 


that the document preparation system will draw and 
caption the figures for each draft and the final copy. 
The language is general-purpose in that it knows 
nothing about particular kinds of figures like electric 
circuit diagrams or chemical structure formulas. It is 
also more than just a set of standard graphics routines 
to draw points, lines, and circular arcs like some pre- 
vious graphics languages[14, 18]. 

The constructs in this block-structured declara- 
tive graphics language are all new. The reader may 
recognize the "box" notion as a generalization of the 
one proposed by Kemighan and Cheny[12], and ela- 
borated by Knuth in the TgXt system[13], and the use 
of simultaneous equations as an extension of Knuth's 
use of them in METAFONT[13]. IDEAL has been imple- 
mented and was used to produce this paper; in fact, 
other users have prepared figures with it too! We dis- 
cuss some implementation issues below. 

It is important to remember that the subject of 

t The American Mathematical Society holds the copyright on 
the T^X logo. 


99 


this paper is a set of ideas for useful constructs in a 
graphics language. We have concentrated on explain- 
ing how a small set of primitives in concert yield a 
powerful graphical expression language, so we have 
sacrificed some conciseness in the syntax. Keep in 
mind too that many of the simple figures used as 
examples would, in practice, be available in a library 
of common components. 

Overview 

We call the fundamental object of which figures 
are composed a "box." (Don't be misled: the name 
was chosen for historical reasons and is now synec- 
dochical for any bunch of picture and text that hangs 
together.) Box definitions have three parts: 

• a set of constraints on the relative locations of 
points of the box; 

• a set of commands relative to points of the box, 
such as requests to draw a line connecting two 
of them or to place another box with respect to 
some of them; and, optionally, 

• a boundary list of points that specifies the 
polygonal region that the box will erase if asked 
to do so. 

When a box is to be instantiated, we add constraints 
to those already in its definition, and altogether the 
constraints determine uniquely the size, shape, and 
location of that instance of the box. 

Some other common drawing operations are 
readily expressed in terms of the box notion: 

• drawing dashed or dotted lines is a special case 
of replicating a box along a line; 

• drawing a collection of objects that overlap so 
that parts of some obscure parts of others 
amounts to asking for some boxes to erase parts 
of others; and, 

• cross-hatching a polygonal area is a special case 
of replicating a box containing a desired texture 
over the area, then erasing what falls outside. 

A simple pair of operations enables us to local- 
ize the effect of erasing. The mental model is that we 
sketch pieces of a picture on different pieces of paper, 
then merge them into a complete figure. 

In what follows, we imagine ourselves to be 
drawing on the complex plane, identifying points 
with complex numbers in the usual way. The com- 
plex number (0,1) is the imaginary unit, V— 1. Abso- 
lute numbers that appear indicate only relative magni- 
tudes and positions; final decisions on position and 
scale are made when the figure is about to be drawn. 

Boxes 

Behold the quintessential box: 


topleft 


botleft 


topright 


botright 


rectangle { 

var botright, botleft, 
topright, topleft, 
width, height; 
botright = botleft + width; 
topleft = botleft + ( 0, 1 ) *height ; 
topright = topleft + width; 
conn botleft to botright; 
conn botright to topright; 
conn topright to topleft; 
conn topleft to botleft; 

} 

In the first six lines of this code we declare six com- 
plex variables that are local to this box and partially 
constrain their values by giving three simultaneous 
equations that they must satisfy. Note that this con- 
straint set does not determine uniquely the values of 
the variables: it just specifies a subset of the con- 
straints that any instance of rectangle must satisfy. 
We will augment it when we call for an instance of 
this box. Each of the next four lines of code draws a 
side of the rectangle: the ideal primitive "conn x to y" 
means x is connected to y by a straight line. 

We call for boxes to be drawn with a put state- 
ment. These statements are analogous to procedure 
calls in conventional programming languages, but 
whereas the usual procedure call includes a parameter 
list in parentheses, put statements include a parameter 
section, made up of constraints to be added to the 
declaration section of the called box so that its local 
variables are determined uniquely. We use this 
mechanism so that each call of a box can reflect 
directly the constraints particular to that instance of 
the box. For instance, sometimes we know one 
comer and the dimensions the rectangle should have, 
and other times we know two comers and the orthog- 
onal dimension. Both of the put statements below 
draw the same box; note that in each we have speci- 
fied only what we know about the rectangle. 

put rectangle { 

height = 0.5; 
width = 0.75; 
botleft = 0; 

} 


or 

put rectangle { 

topleft = (0,0.5); 
botleft = 0; 
width = 0.75; 

} 


100 


We can also add nontrivial equations that link the 
values of several variables. In this example, we add 
an equation that amounts to asking for a square: 




:om- 
ially 
eous 
con- 
s of 
con- 
isfy. 
e of 
■vs a 
o y" 

j 

cate- 

iure 

it 

r 

neter 
the 
local 
this 
fleet 
:e of 
one 
ave, 
hog- 
elow 
peci- 


the 

add 



put rectangle { 

topright = 0; 
width = 0.5; 
topleft - botleft 
= ( 0 ,l)*(botright - botleft); 


Using complex values in the constraint equations 
allows us to rotate boxes easily: 


put rectangle { 

botleft = 0; 
botright = (1,1); 
height = (0. 3,0.3); 


and even deform them into parallelograms: 




put rectangle { 

botleft = 0; 
botright = (1,0.6) ; 
topright = (1,1.1); 

> 

Thus, each put represents a specially tailored 
instance of a box. There is no reason to restrict this 
tailoring to adding constraints to those already in the 
box: we can add any valid instruction to be in effect 
only for the put in which it appears. So, when we 
need a sequence of instructions for a one-shot picture 
component, we don't need to break it out as a 
separately defined box: we simply include it as the 
parameter section of a put of the simple box defined 
as null 0- 

To enable us to refer to the local variables of a 
box that has been placed already, we name put state- 
ments. For instance, we might write these instruc- 
tions to illustrate the insertion of an element into a 
linked list: 

(Note: a [x,y] means “a of the way from x to y"; that 
is, a[x,y] = x + a(y-x). Recall too that cis 8 is a unit 
vector in the direction 0; formally, cis 0 = e' 9 

= cos 0 + i sin 0. Angles appear in degrees.) 



listnode ( 

/* size is global */ 
var ne, nw, sw, se; 
ne = nw + 2*size; 
se = sw + 2*size; 
nw = sw + ( 0 ,l)*size; 
conn ne to nw; 
conn nw to sw; 
conn sw to se; 
conn se to ne; 

conn 0 . 5 [ nw , ne ] to 0 . 5 ( se , sw ] ; 


arrow { 

/* head tells the length of 
the arrowhead along the shaft */ 
var head, headvec, start, end; 
headvec = end + 

head* ( start-end ) /abs ( start-end ) ; 
conn start to end; 
conn end to cis(25) [end, headvec] ; 
conn end to cis(-25) [end, headvec] ; 


main { 

var size; 
size = 0.4; 
put first: listnode { 
nw = 0; 

); 

put last: listnode { 

sw = 2 [first.sw, first.se] ; 

/* null pointer: */ 
conn ne to 0.5[se,sw]; 

}; 

put new: listnode { 

nw = 2 [first. ne.first.se] ; 

); 

put arrow { 

start = 0.75[first. sw.first.se] 
+ (0,0.5)*size; 
end = new.nw; 
head = 0.25; 

}; 

put arrow { 

start = 0.75[new. sw.new.se] 

+ (0,0.5)*size; 
end = last.sw; 
head = 0.25; 

}; 

} 


Note the following uses of IDEAL features: 

• conns need not draw lines between named 
points, but can connect points expressed in 
terms of other points; 

• arrow is written so as to keep the head of the 


101 




arrow properly oriented about the shaft of the 
arrow, no matter at what angle the arrow 
points; and. 


the null pointer in the final listnode is drawn by 
a conn statement added to the put named last . 


It is easy to include text in figures if the typeset- 
ting system will accept any piece of text and wrap it 
up for us in a box. Of course, a sufficiently patient 
and ambitious typesetting system maintainer could 
make these boxes arbitrarily fancy, but here is a sim- 
ple textbox containing the asymptotic formula for har- 
morric numbers: 


" 1 

rp-^T = l°g H+y + O 

SW ll * 


The system computes the height ( ht ), width 
(red), and depth (dp) of the string, and makes the box 
containing the string available to us. Now we can 
place it by pinning down any comer or the reference 
point (rp). We choose this mechanism so that special 
features of the typesetting system (in this case, 
mathematics formatting) can be used to set figure cap- 
tions, while at the same time the dimensions of the 
string are accessible to other boxes for use in con- 
straints, put and conn statements. 

Circles and circular arcs fit nicely into the box 
framework. Although they are actually primitives in 
the language, we can think of a circle as a box defined 
like this: 

circle { 

var zl, z2, z3, radius, center; 

(re(zl) - re(center) )»*2 
+ (im(zl) - im( center ) )**2 
= (re(z 2 ) - re(center ) )»*2 
+ (im(z 2 ) - im(center ) )»*2 
= (re(z3) - re(center ) )»«2 
+ (im(z3) - im(center) )**2 
= radius** 2 ; 

/* some magic instructions 

to the system to draw a circle 

of the specified radius 

at the specified center */ 

} 

Circles point up well the advantages of our methods 
for defining and placing boxes. We can use the same 
box to draw a circle by giving its center and radius, or 
by giving three points on it (zl, z2, and z3), or by 
giving its center and one point on it (zl, zl, or z3), 
and so on. In a conventional programming language, 
we would need a different procedure for each dif- 
ferent set of possible parameters. We asked for the 
circle on the left below by giving its center and 
radius, but we gave the center and point z2 for the 
circle below on the right. 



The box for a circular arc can be defined in a similar 
fashion to allow a variety of arc specifications. (See 
Appendix 1.) 

More Operations With Boxes 

Technical illustrations often call for different tex- 
tures of lines — some dashed, others dotted, these 
doubled, those wavy. A simple iteration construct 
encompasses these possibilities by formalizing the 
notion of "drawing along a line with a box." We write 
conn a to b using n pen { . . . }<z ,y > ; 

as shorthand for 

for i = l to n 
put pen { 

x = ( ( i-1 )/n )[a ,b]; 
y - ( i/n ) [a ,b] ; 

}; 


that is, replicate pen n times along the line from a to 
b, hooking the ith instance's point x to the (i + l)st 
instance's point y . 

For instance, we might use the following box: 


four copies of wavy 


box wavy { 

/* perp is a unit vector normal 
to (end - start) */ 
var start, end, 
ht, perp, ptl, pt2; 
perp = (0,1)* 

(end - start )/abs( end - start); 
ptl = (1/4) [start, end) + ht*perp; 
pt2 = (3/4) [start, end] - ht*perp; 
conn start to ptl; 
conn ptl to pt2; 
conn pt2 to end; 

} 


to highlight a hamiltonian path of this graph: 



Note that the waves of the pen scale to fit the line 
exactly. In contrast, if we used a template to draw 
such lines, we would probably have to truncate the 
pattern at one or both ends. 

We may also ask for a particular instance of a 
box to be 'opaque,' which means it obliterates pieces 
of lines that are already in the picture and that lie in 


102 





milar 

(See 


t tex- 
these 
struct 
the 
•vrite 


a to 
-l)st 


line 

raw 

the 


the region specified by the boundary list. Opaque 
boxes are useful for placing captions over lines in a 
picture, as in this example: 



But opaque boxes turn out to have many appli- 
cations when we allow a box to have either an opaque 
interior or an opaque exterior. 

We can use pens and opaque boxes to shade 
regions. For instance, to shade the interior of a pen- 
tagon using the wavy pen defined above, we first 
construct a box "brush," which consists of seven 
copies of "wavy" going horizontally: 


brush { 

var top, bot; 
var bwd, bht; 
var leftpt, rightpt; 
leftpt = 0.5*(top+bot) - bwd/2; 
rightpt = 0.5*(top+bot) + bwd/2; 
conn leftpt to rightpt 
using 7 wavy { 
ht = bht; 

}<start,end> ; 

} 


Then we use "brush" to draw vertically over the 
region where the pentagon will lie. 


MM, 


conn (0,1) to 0 

using 7 brush { 
bwd = 1; 
bht =0.1; 
}<top,bot>; 



pentagon { 

var ptl, pt2, pt3 , pt4, pt5; 
var radius, center; 


+ CIS 


ptl = center 
pt2 = center 
pt3 = center 
pt4 = center 
pt5 = center 
conn ptl to pt2; 
conn pt2 to pt3; 
conn pt3 to pt4; 
conn pt4 to pt5; 
conn pt5 to ptl; 
bdlist = ptl, pt2 


radius ; 

cis (72) ‘radius; 

(144) ‘radius; 
cis (-144) ‘radius; 
cis (-72) ‘radius; 


pt3 , pt4 , pt5 ; 


put pentagon { 

center = (0,0.5) ; 
radius = (0,0.5); 
opaque exterior; 

}; 


Opaque boxes can be too destructive sometimes: 
just try drawing two shaded regions in the same pic- 
ture! To circumvent this difficulty, IDEAL includes a 
pair of commands inspired by the physical analogy of 
tracing paper and slides for overhead projectors. The 
construct command is just like put, but it lays a sheet 
of tracing paper over the current picture, so we can 
still see points on the picture, and draw to our hearts' 
content, without touching the picture beneath at all. 
The draw command transfers an image from the 
named sheet of tracing paper to the picture under- 
neath. 

Now, to draw any number of shaded regions in 
a picture, we just construct each on a separate tran- 
sparency, using pens and opaque boxes as described, 
and weld the set together at the end. 



)f a 
■ces 
? in 


Finally, we ask for a pentagon with an opaque exte- 
rior. 


103 



main { 

construct A: null { 
conn (0,1) to 0 

using 7 brush { 
bwd = 1 ; 
bht = o.l; 

}<top,bot>; 
put pentagon { 

center = (0,0.5) ; 
radius = (0,0.5); 
opaque exterior; 

}; 

>; 

construct B: null { 

conn (0.5, 0.5) to (1.5, 0.5) 
using 5 brush { 

bwd = (0,1); 
bht = 0.1; 

}<top,bot>; 
put circle { 

center = (1,0.5) ; 
radius = 0.5; 
opaque exterior; 

}; 

}; 

draw A; 
draw B; 

> 

The construct command also provides a way to simu- 
late "invisible boxes," which do not appear in the pic- 
ture, but whose points can be referenced nonetheless. 
This can be useful, for example, when a geometric 
construction calls for a reference line that should not 
appear in the final picture. 

Summary 

We have presented a one-dimensional language 
for talking about two-dimensional figures that is 
intended to be used in cooperation with a typesetting 
system for preparing graphical displays in documents. 
The language includes a declarative means for defin- 
ing "boxes" — picture components — so that they 
can be tailored to fit in many different circumstances 
by means of a generalized procedure mechanism. We 
might, for instance, ask for a rectangle stretched so its 
upper left comer is "here" and its lower right comer 
is "there." 

From boxes we moved to "pens," which use 
boxes to draw lines of different textures, and to 
"opaque" boxes, which erase pieces of pictures. 
Taken together, these primitives allow us to fill in 
regions with textures. Another pair of commands 
localizes the effects of pens and erasers, so that pic- 
tures can be built up from subpictures. 

Algorithms 

1. Variable Reference and Scope 

The following rules governing where variables 

may be referenced help to keep the use of variables 
disciplined, so that ideal programs can be written and 
debugged with some confidence: 


• In box definitions, we may reference variables 
that we expect to be global to the box whenever 
it is put, and any of the variables of boxes that 
are put from it. 

• In put statements, we may reference variables 
known either in the calling or the called box; 
note that this includes variables known in other 
puts from the same box. 

The picture of insertion into a linked list illustrates all 
of these possibilities. 

2. Constraints 

We have described IDEAL using the abstract term 
"constraints" rather than the more concrete "equa- 
tions." But just what kinds of constraints should we 
expect to handle in the declaration sections of boxes? 
Here are worst case complexity results for finding the 
solutions of various kinds of constraints. 

• linear equations 

0(n 1 * 3 )[7] 

• linear inequalities 

(same as linear programming[8]) high- 

order polynomial^] or "practical exponen- 
tial"^] 

• linear equations with the operator max 

NP-hard (decision problem is NP- 

complete[21]) 

• quadratic equations 

NP-hard (decision problem is NP- 

complete[21]) 

• cubic equations in integers with linear inequali- 
ties 

Fermat's Last Theorem is a special case 

• polynomials with the sine and absolute value 

functions 

recursively undeddable[4, 6, 16] 

We see from this list that moving beyond linearity in 
the set of allowed constraints quickly becomes compu- 
tationally difficult, and some steps just beyond are 
impossible in general. 

On the other hand, things are not as hopeless 
as some of the complexity results above suggest. 
Many subproblems of practical interest can be solved 
efficiently. For example, it is possible to reduce the 
set of quadratic equations for a circle (shown above) 
to a larger set of almost linear equations that can be 
solved by the method described below. But complex- 
ity is only one of our concerns. One of the properties 
of linear equations on which we have relied heavily is 
that two non-coincident lines intersect in at most one 
point. But even two quadratics can intersect in four! 
So including higher-order polynomials would require 
that we develop a means of specifying which point of 
an intersection set is meant. Finally, how much 


104 



er 

st 


?s 

x; 

:r 

.11 


m 

a- 

e 

;? 

se 


n- 

n- 


P- 


ili- 


ue 


in 

u- 

lre 

ess 
st. 
ed 
:he 
/e) 
be 
ex- 
ies 
• is 
>ne 
ur! 
ire 
of 
ich 


symbolic algebra capability should we build into a 
typesetting system? It seems wise to keep the pro- 
gram (relatively) simple by including only the most 
useful forms of equation systems. 

3. Slightly Nonlinear Systems 

In the present implementation we allow con- 
straints to appear as “slightly nonlinear” systems of 
equations. A system of equations is slightly nonlinear 
if there is an order in which the equations can be pro- 
cessed such that, after substituting results known 
from previous processing, the equation looks linear. 
An example of such a system is 

x=3 
y=xz 
y = z + 2 

Even though it includes the nonlinear equation y=xz, 
we can solve the system in this order by using partial 
results as we read each equation. 

We begin solving a system of equations by 
declaring each variable to be independent and placing 
all equations on a queue for processing. To process 
an equation, replace any dependent variables by their 
representation in terms of independent variables, per- 
form all possible multiplications, divisions, and func- 
tion evaluations, and subtract one side from the 
other, equating the result to zero. If the result is a 
linear combination of zero or more independent vari- 
ables, choose the appropriate option: 

• If the result contains no variables, then it says 
either "0 = 0" or "0 * 0"; the former means 
the equation was redundant, while the latter 
means it was inconsistent. 

• If the result contains one variable, we can solve 
for its value. 

• If the result contains more than one variable, 
we choose one, and make it dependent on the 
others. (Numerical analysts usually pick a larg- 
est element of a row as pivot, so we also pick a 
variable whose coefficient is largest in absolute 
value.) 

If the result is not linear, say, because two unknowns 
were multiplied, place the equation back on the queue 
of equations for later processing. If we ever go 
through the whole queue without learning anything, 
the system is not slightly nonlinear, and an error mes- 
sage is required. 

4. Special Nonlinear Systems 

The only truly nonlinear equations we must 
solve arise in processing circles and arcs. To place a 
circle, for instance, we must know its center and its 
radius. Given three points on a circle, we can plug 
their coordinates into the quadratic equation for a cir- 
cle to get three quadratics in three real unknowns. 


Subtracting the quadratics in pairs gives a system of 
two linear equations in two unknowns from which we 
can solve for the center, and from the center and any 
point on the circle we can determine the radius. The 
other cases that arise for circles and circular arcs also 
reduce to linear equations. Appendix 1 contains the 
library specification of circles and arcs. 

5. Opaque Boxes 

Placing an opaque polygon is the well-known 
problem of polygon clipping. We use Sutherland and 
Hodgman's polygon clipping algorithm[19], which 
handles concave polygons as well as convex ones. 

We can implement opaque circles by solving for 
the intersection of the opaque circle with other lines 
and circles. (This is especially easy because circles are 
convex figures.) But just what is an opaque arc? Two 
reasonable choices are the sector and the segment of 
the circle subtended by the arc: 

C> 

It turns out we need both, because we can't write, 
say, an opaque segment box given an opaque sector 
box, or vice versa. The root of the problem is that 
sectors have larger area than segments when the arc 
subtends fewer than it radians, but the reverse is true 
for longer arcs. 

Discussion 

We mentioned at the start of this paper that we 
would emphasize ideas for a graphics language rather 
than suggest a concise syntax. We could get rid of 
most of the keywords in IDEAL with hardly a change 
to the yacc[9] grammar. Certain useful abbreviations, 
like “conn x to y to z" for a succession of line seg- 
ments through those points, would also make pro- 
grams shorter. It is tempting to suggest lots of 
language features that would make particular kinds of 
pictures easier to draw. It is probably more fruitful, 
however, to design a higher-level language for partic- 
ular classes of drawings, like circuit diagrams or parse 
trees, that compiles into IDEAL. Readers may recog- 
nize the influence of the philosophy that also under- 
lies other tools in the UNIXt system like EQN[12]. 

On the other hand, programmers using IDEAL 
notice that certain “idioms" occur frequently. For 
instance, (0,1)* (y -x)/abs(y -x) is a unit vector per- 
pendicular to the segment between x and y . And, of 
course, most pictures include some common ele- 
ments like rectangles and arrows. A simple language 
that knows about these idioms (like PIC[11]) could be 
compiled into an ideal program whose fine details 
could then be tuned. 

t UNIX is a Trademark of Bell Laboratories. 


105 



We also have not addressed explicitly questions 
about the interaction of different pieces of ideal. For 
example, which set of constraints should be evaluated 
first when a box is placed — the constraints in the 
parameter section, or those in the definition itself? 
We evaluate the parameter section first, so that a box 
definition may include default values for the vari- 
ables, but these values can be overridden by equa- 
tions in the parameter section. Experience with the 
implementation also suggests that it is convenient to 
interpret "a.b.c.x" as meaning “the value of x in the 
environment a.b.c" rather than “the value of x that 
lives at a.b.c." Bob Tarjan suggests that it might be 
useful to be able to talk about "x who lives at my 
uncle's" explicitly, rather than having to name the 
uncle u and refer to u.x. 

Would we want to include flow-of-control state- 
ments in ideal? Conditionals would be pretty easy, 
as long as we insisted that the condition could be 
evaluated before the decision about which action to 
take was made. Including general iteration raises 
more interesting questions: we need a means of defin- 
ing names of instances of boxes dynamically if we 
ever plan to refer to them again. (The boxes placed 
by pen statements are not named, so we cannot refer 
to their local variables outside the pen statement.) 
Another problem is that IDEAL variables are assigned 
at most once, so even the simple iteration construct 
"for i = 1 to n" doesn’t fit: maybe we need different 
kinds of variables. It is only fair to point out that the 
pen statement can be used as an iteration construct, 
although it may not be intuitive. For instance, to 
draw a dashed arc, we can say 



conn 0 to 180 

using 10 arc { 
center = 0; 
radius = 1 ; 

} <startang , 9+endang> ; 

(Really! Look at the formal explanation of pens to 
understand why.) 

Some of the ideas in ideal have been adopted 
in other graphics languages. Brian Kemighan's pic 
language[ll] can draw a proper subset of the 
diagrams that IDEAL can draw, but it does not require 
the user to understand simultaneous equations to 
define a picture, pic includes the ability to refer to 
points of a figure once it has been placed. Steve 
Johnson's language i[10] for VLSI design includes this 
feature and allows point locations to be defined by 
inequalities on x- and y -coordinates. Neither 
language, however, allows much interaction between 
x- and y -coordinates, and neither supports complex 


arithmetic. 

ideal has been implemented under UNIX as a 
preprocessor to Troff, like EQn[ 12] orTBL[15], This is 
a limitation, since, for example, it is not feasible to 
implement even the simple textboxes described above: 
all the figure processing has to be done before Troff 
gets a chance to determine the size of the typeset 
string. Instead, ideal includes three commands to 
place text with its left-hand endpoint, its center, or its 
right-hand endpoint at a given point. We plan to 
implement ideal as a full-fledged partner in the 
typesetting team. Eventually, this could aid in layout 
decisions, where the typesetting system tells ideal 
how much room is left on the page, and IDEAL scales 
the figure to fit. 

A question frequently posed is whether an 
interactive implementation could free the user from 
the need to specify so much detail in programs. If we 
surrender the underlying language entirely, as is done 
in Draw [2] and Markup [17], we lose most of the ability 
to write programs to generate pictures that include a 
high degree of regularity. It would also be a major 
effort to incorporate into the interactive description a 
cue for where textual labels should appear. A promis- 
ing approach is to develop a hybrid system that 
allows access to both language-directed programming 
and interactive graphics, with the system keeping 
both representations current. A step in this direction 
(developed for a very different application) has been 
taken by Boming[3]. 

Why IDEAL 

Among the names considered for the language 
were GRaphics OUtput Package and Really INtegrated 
Graphics. However, the language really is more than 
a set of standard graphics subroutines, but is not (as 
confessed above) truly integrated yet with the rest of 
the text processing system. Since it is certainly a 
GROUP, then, but not quite a RING, it must be an ideal. 

Acknowledgements 

An embryonic form of ideal is described in[20] 
and ideal was first presented in this form in[21], I 
have hashed over these ideas with many people, but 
my dissertation reading committee — Donald E. Knuth, 
Robert E. Tarjan, Brian W. Kemighan, and Luis I. 
Trabb Pardo — deserves special mention for hours of 
careful reading and criticism as the language took 
shape. 

References 

1. Bengt Aspvall and Richard E. Stone, 
"Khachiyan's linear programming algorithm," 
Journal of Algorithms 1(1), pp. 1-13 (1980). 

2. Patrick C. Baudelaire, "Draw manual," pp. 97- 
128 in Alto User's Handbook, Xerox Corporation, 
Palo Alto, California (1979). 


106 


* 


is a 
'.s is 


to 


>ve: 

OFF 

•set 

to 

its 

to 

the 

out 

EAL 

lies 

an 
jm 
we 
me 
lity 
f a 
jor 
i a 
lis- 
~iat 
n g 
n g 
on 
en 


:ed 

an 

as 

of 

a 

.L. 


• 0 ] 

i 

ut 

:h, 

I. 

of 

ok 


e. 


7- 

a. 


3. Alan Boming, Thinglab: A Constraint-Oriented 
Simulation Laboratory, Stanford University (1979). 
Ph.D. dissertation 

4. B. F. Caviness, "On canonical forms and sim- 
plification," Journal of the ACM 17(2), pp. 385- 
396 (1970). 

5. George B. Dantzig, Linear Programming and 
Extensions, Princeton University Press (1963). 

6. Martin Davis, "Hilbert's tenth problem is unde- 
ridable," American Mathematical Monthly 80(3), 
pp. 233-269 (1973). 

7. George E. Forsythe and Cleve B. Moler, Com- 
puter Solution of Linear Algebraic Systems, Prentice 
Hall (1967). 

8. Michael R. Garey and David S. Johnson, Com- 
puters and Intractability: A Guide to the Theory of 
NP-Completeness, W. H. Freeman (1979). 

9. S. C. Johnson, Yacc: yet another compiler-compiler, 
Bell Laboratories, Murray Hill, New Jersey 
(1978). 

10. S. C. Johnson and S. A. Browning, The LSI 
design language i, Bell Laboratories, Murray Hill, 
New Jersey (1980). 

11. Brian W. Kemighan, "PIC — a crude graphics 
language for typesetting," Proceedings of the 
ACM SIGOA/SIGPLAN Conference on Text Mani- 
pulation (1981). 

12. Brian W. Kemighan and Lorinda L. Cherry, "A 
system for typesetting mathematics," Communi- 
cations of the ACM 18(3), pp. 151-157 (1975). 

13. Donald E. Knuth, 7£X and METAFONT: New 
Directions in Typesetting, Digital Press, Educa- 
tional Services, Digital Equipment Corporation, 
Bedford, Massachusetts 01730 (1979). 

14. H. E. Kulsrud, "A general purpose graphic 
language," Communications of the ACM 11(4), 
pp. 247-254 (1968). 

15. M. E. Lesk, Tbl — a program to format tables, Bell 
Laboratories, Murray Hill, New Jersey (1976). 

16. Ju. V. Matiyasevic, "Enumerable sets are 
diophantine," Soviet Mathematics Doklady 11(2), 
pp. 354-358 (1970). 

17. William M. Newman, "Markup manual," pp. 
85-97 in Alto User's Handbook, Xerox Corpora- 
tion, Palo Alto, California (1979). 

18. M. G. Notley, "A graphical picture drawing 
language," Computer Bulletin 14(3), pp. 68-73 
(1970). 

19. Ivan E. Sutherland and Gary W. Hodgman, 
"Reentrant polygon clipping," Communications of 
the ACM 17(1), pp. 32-42 (1974). 

20. Christopher J. Van Wyk, A graphics language for 
typesetting, Bell Laboratories, Murray Hill, New 
Jersey (1979). 


21. Christopher John Van Wyk, A Language for 
Typesetting Graphics, Stanford University (1980). 
Ph.D. dissertation. 


Appendix 1 

Here are the definitions for circles and circular 
arcs. The - operator means that IDEAL will not com- 
plain if the equaton is inconsistent. 


circle ( 

var rad, radius, center, 
zl, z2, z3; 

var al, bl, cl, a2, b2, c2; 
zl " z2 ' z3 - center + rad; 
al = 2 * ( re ( z2 ) -re ( zl ) ) ; 
bl = 2«(im(z2)-im(zl) ) ; 
cl = - re(zl)»re(zl) 

- im(zl)*im(zl) 

+ re(z2)*re(z2) 

+ im(z2)*im(z2) ; 

a2 = 2*(re(z3)-re(z2) ) ; 
b2 = 2*(im(z3)-im(z2) ) ; 
c2 = - re(z2)*re(z2) 

- ira(z2)*im(z2) 

+ re(z3)*re(z3) 

+ im(z3)*im(z3) ; 

al*re(center) + bl*im(center) = cl; 
a2*re(center) + b2*im(center) = c2; 
radius = abs(rad); 
rad - radius; 


arc { 


var radius, center, 

start, midway, end, 
startang, midang, endang; 
var al, bl, cl, a2, b2, c2; 
start = center 

+ abs(radius)*cis(startang) ; 
end = center 

+ abs(radius)»cis(endang) ; 
al = 2»( re (end) -re (start) ) ; 
bl = 2*(im(end)-im(start) ) ; 
cl = - re ( start )*re( start) 

- im ( start )«im( start) 

+ re(end)*re(end) 

+ im(end)*im(end) ; 
a2 = 2» (re(end)-re( midway) ) ; 
b2 = 2«(im(end)-im(midway) ) ; 
c2 = - re(midway)*re(midway) 

- im ( midway )*im( midway) 

+ re(end)*re(end) 
+im(end)*im(end) ; 

al*re(center) + bl*im( center) = cl; 
a2*re(center ) +■ b2*im(center) = c2; 
startang = atan2 (start - center); 
midang = atan2 (midway - center); 
endang = atan2 ( end - center ) ; 
radius = abs (start - center); 
radius = abs (midway - center); 
radius = abs (end - center); 
midway - 

center + abs(radius)*cis(midang) ; 
midway - start; 


i 


107 


Prettyprinting in an Interactive Programming Environment 


Martin Mikelsons 
Computer Science Department 
IBM T. J. Watson Research Center 
Yorktown Heights, New York 


ABSTRACT: Prettyprint algorithms designed for 
printing programs on paper are not appropriate in 
an interactive environment where the interface to 
the user is a CRT screen. We describe a data repre- 
sentation and an algorithm that allow the efficient 
generation of program displays from a parsed inter- 
nal representation of a program. The displays show 
the structure of the program by consistent and 
automatic indentation. They show the program in 
varying levels of detail by replacing unimportant 
parts with ellipsis marks. The relative importance 
of program parts is determined jointly by the 
structure of the program and by the current focus 
of attention of the programmer. 


INTRODUCTION 


Prettyprint algorithms designed for printing pro- 
grams on paper are not appropriate in an interac- 
tive environment where the interface to the user is 
a CRT screen. A carefully formatted listing is a 
very readable representation of a program. 
However, when this listing is presented on a 
screen, each instantaneous view presents only a 
fragment of the program and most of the time impor- 
tant contextual information is not visible. In an 
interactive programming environment [1, 2, 4, 5] 
where the system has access to the parsed represen- 
tation of the program, and where the system has a 
record of the recent editing activity of the user, 
we can profitably go beyond conventional pretty- 
printing methods to present a useful and coherent 
view of the program to the user at the display con- 
sole . 

We can determine what part of the program is cur- 
rently of interest to the user. We can elide por- 
tions of the program that seem irrelevant to the 
current picture in order to bring closer together 


Permission to copy without fee all or part of this material is granted 
provided that the copies are not made or distributed for direct 
commercial advantage, the ACM copyright notice and the title of the 
publication and its date appear, and notice is given that copying is by 
permission of the Association for Computing Machinery. To copy 
otherwise, or to republish, requires a fee and/or specific permission. 

0 1981 ACM 0-89791-043-5/81/0600/0108 S00.75 


widely separated but important features. We can 
suppress the text that represents the detailed 
structure of statements and expressions in order to 
show an overview of the structure of a program. We 
can use abbreviations and special symbols to con- 
dense the text. We can use different fonts or 
colors to distinguish keywords from identifiers. 
In short, each instantaneous view of the program 
can be generated to show the features and compo- 
nents that are significant at that particular 
moment. Figure 1 shows for a simple example the 
difference between our approach and conventional 
prettyprinting. 

The notion of a condensed form for expressions has 
been a feature of Lisp expression editors from 
their earliest origins [3, 6]. More recently, 
several interactive programming environments sup- 
port some form of ellipsis [2, 4, 5]. In all 
cases, however, the level of detail is controlled 
by a user parameter or by explicit suppression of 
specific phrases. We feel that a programming tool 
should not require that degree of user involvement 
in the ubiquitous process of filling the screen 
after each interaction. 


(a) A window on a file 


1 

call b; 

1 


end; 



else 



do; 



x=x+l; 



y=y+l; 



end; 



end; 



L 


(b) 

A window on 

a program 

1 

PR0C(x,y) ; 




IF x<y 




THEN . . 
ELSE DO 

x=x+l ; 

l 

i 



y=y+l; 

END; 

i 

i 


END; 


i 

_l 


Figure 1. Two views of a program fragment 


108 



:an 
. ed 
to 
We 
Dn- 
or 
s . 
am 
o- 
ar 
he 
al 


as 

ora 


o- 

L1 

ed 

of 

ol 

at 

in 


IF f (x, y, z) 

THEN DO; x=l; 

y=2; 

z=3 ; 

END; 

ELSE DO; x=2; 

y=3; 

z=4 ; 

END; 

Figure 2. A sample compound statement 


Overview of the Display Generation Process 


There are many parameters that control the format 
and content of the kind of display we have 
described. We could require that the user specify 
all these parameters. But since we are dealing 
with an interactive programming environment, the 
user is constantly issuing commands that manipulate 
and modify the program shown on the screen. We can 
usually infer from these commands what part of the 
program is of interest to the user and how it 
should be displayed. We have isolated three major 
modes that cover most of the activities that pro- 
grammers engage in when editing programs. Each 
mode defines a focus of attention and a strategy 
for displaying the contents and context of the 
focus . 

Edit Mode: In edit mode, the user is examining or 
changing a program in detail. The focus of atten- 
tion is taken to be the most recently selected or 
modified phrase; the strategy is to show some of 
the context in which the focus occurs, and an ade- 
quate amount of detail within the focus. The rele- 
vant context is usually some indication of the 
phrase that contains the focus, such as a DO group 
or IF statement, and some of the phrases that imme- 
diately precede and follow the focus. Figure 3 
shows several views of a statement in edit mode. 
Edit mode is also in effect during interactive 
debugging of a user program. In this case, the 
focus is selected by the interpreter to show the 
progress of execution through the program. 

Multiple Focus Mode: In multiple focus mode, the 
focus of attention is a set of phrases selected 
from the program by a search or an analysis func- 
tion of the system. The display strategy is to 
show all or most of the phrases in the focus and 
some of the program structure that contains them 
and that connects them together. Figure 4 illus- 
trates this mode of display. This mode can also be 
used to bring together on one screen two or more 
widely separated parts of a program in order to 
compare them. 

Reading Mode: This mode is used to read a program 
sequentially. In reading mode, the strategy is to 
show a maximum of detail in the center of the dis- 
play area and to force most ellipses to the first 
and last lines; by providing commands that shift 
this window to the next part of the program, we 
simulate in a more structured way the process of 
scrolling through a source file. Figure 5 shows 
two successive reading mode screens. 


(a) From the top: 


1 

1 IF 

f(x, y, z) 


i 


THEN DO; x=l; 


I 


y=2; 


i 


END; 


i 


ELSE DO; x=2; 


i 


END; 


i 

(b) 

From inside the 

THEN clause: 

1 




| IF 

f(x, y, z) 


i 

1 

THEN DO; x=l; 


i 

1 

y=2; 


i 

1 

z=3 ; 


i 

| 

END; 


I 

1 

ELSE DO; x=2; 

. . . END; 

i 

(c) 

From inside the 

ELSE clause: 

1 IF 

1 

f(x, y, z) 



THEN DO; x=l; 
ELSE DO; x=2; 

. . . END; 


i 

i 

y=3; 



i 

z=4; 



j_ 

END; 


I 

(d) 

After adding a 

statement 

to 


the ELSE clause: 


IF f(. . .) THEN DO; ... END; 
ELSE DO; x=2; 

y=3; 

z=4; 

w=5; 

END; 


Figure 3. Edit Mode: Four views of the 
statement in Figure 2 on six 30 
character lines. 


Once we have determined what part of the program is 
of interest to the user, our display generation has 
two main steps. First, we generate a data struc- 
ture that contains the information necessary to 
format each part of the program. Second, given a 
specific screen area, a focus of attention, and a 
strategy that determines the relative importance of 
program units, we allocate the available display 
area to a subset of the program text. 


I I 

| IF f(x, y, z) 

THEN DO; x=l; 

y=2; ... 

END; 

ELSE DO; ... y=3; ... 

END; 

I —l 

Figure 4. Multiple Focus Mode: The statement 
in Figure 2 on six 30 character 
lines after selecting all 
occurrences of y in the statement. 



109 



IF f(x, y, z) 

THEN DO; x=l; 

y=2; 
z—3 ; 

END; 

ELSE DO; x=2; . . . END; 


IF f (x , y, z) THEN . . . 
ELSE DO; x=2; 

y=3; 
z=4 ; 

END; 


Figure 5. Reading Mode: Two views of the 
statement in Figure 2 on six 30 
character lines. 


The first step is to generate from the concrete 
parse tree of the program a tree structure of 
objects we call boxes . While the parse tree 
reflects the syntactic distinctions determined by a 
formal grammar, the box structure reflects the 
semantic and esthetic distinctions involved in for- 
matting a program. 

One function of the box structure is to simplify 
the decisions made during formatting. While the 
grammar of a typical language may define hundreds 
of non-terminal symbols, or node types, the box 
structure is composed of fewer than ten different 
kinds of boxes. The box types reflect formatting 
concepts such as functional grouping, concat- 
enation and vertical alignment. 

A second, equally important function of the box 
structure is to allow re-structuring of the parse 
tree. For example, many grammars parse lists of 
statements into left or right recursive binary 
nodes. The natural representation for formatting 
purposes is an ordered list of statements. Another 
example of re-structuring is a subscripted variable 
such as 'a(i)'. A natural decomposition would be 
into two parts 'a' and ' (i) ' , but many grammars 
parse this example into a structure where the sub- 
script part does not exist as a node. 

The box structure must be generated from a top-down 
scan of the concrete parse tree in order to recog- 
nize correctly the various constructs of the lan- 
guage . 

The second step is the display allocation process . 
In contrast to the top-down nature of box gener- 
ation, this step is typically an inside-out 
process. The key parameters to the display allo- 
cation process are a window that defines the avail- 
able display area, a focus that defines the user's 
focus of attention in terms of the box structure, 
and a strategy that defines the relative priorities 
of the actions possible during the process. The 
goal of display allocation is to fill the window 
with text that shows the context and contents of 
the focus. The resource to be allocated is the 
available display area; the contenders for this 
resource are the parts of the program within and 
around the focus. Since allocation is done with 
respect to the box structure, it is entirely inde- 
pendent of the source language. 


In order to create a display we simply send to the 
display device the characters allocated by the sec- 
ond step. This is the only device-dependent part 
of the process . 


In the following sections, we first describe the 
structure of boxes and how they are used to repre- 
sent the semantic and esthetic decisions involved 
in formatting a program. We then describe the 
allocation algorithm and how it is varied to 
reflect the strategies necessary to implement the 
three main display modes. Since all the steps in 
display generation may be invoked as a result of 
each user interaction with the system, the neces- 
sary computations must be performed within severe 
time constraints. We show how the algorithm can be 
tuned to meet these constraints. In the last sec- 
tion we describe how box generation is parametrized 
with respect to language. 


BOX CHARACTERISTICS 


The purpose of each box is to summarize the for- 
matting information necessary to display that par- 
ticular part of the program. The purpose of the 
box structure is to represent the nesting structure 
of the components of a program. There are atomic 
boxes that describe individual symbols in a 
program, and composite boxes that describe groups 
of symbols and boxes. Each box also contains a 
number of flags and parameters that define the 
desired format for that box. A key feature of the 
box structure is that it defines at each point in 
the program a collection of possible displays that 
consume increasing amounts of display area. In all 
cases the parameters represent intentions, and not 
exact specifications.. 


Properties of Program Text 


A salient property of program text is the fact that 
a program reads from left to right as a 
one-dimensional string of symbols; at the same 
time, the meaning of the program is defined by a 
tree structure of nested composite objects. The 
program string is made readable by breaking it up 
into symbols and groups of symbols and by separat- 
ing these groups on different lines of a page. 
Membership in a higher grouping is shown by verti- 
cal alignment while depth of nesting is shown by 
the degree of indentation. We note however that 
when a structure is small, it is not necessary to 
lay it out vertically to make it readable; the eye 
can parse it as well as the machine. 

In an interactive display situation we can use 
these properties to advantage in order to increase 
the amount of useful information on each screen. 
Small structures can be shown on several lines if 
they are near the focus of attention; they can be 
shown condensed on one line if they are part of 
relatively less important context. Large struc- 
tures may also show as small structures if they are 
condensed by ellipses. 

Figure 6 and Figure 7 show some of the box struc- 
ture for the statement in Figure 2. 


110 



Atomic Boxes 


Atomic boxes describe an occurrence of a symbol in 
a program. The main information associated with an 
atomic box is the text of the symbol. The text in 
an atomic box is identified as either a special 
symbol, a keyword, an identifier or a constant. 

Keywords and identifiers are often not distinguish- 
able in the source string, yet in the abstract rep- 
resentation, their meaning is very different. By 
distinguishing keywords from identifiers in the box 
structure, we can identify them in the display. 
The simple expedient of showing keywords in upper- 
case characters and identifiers in lowercase 
characters enhances the readability of a program. 
We can also distinguish them by color or font on a 
display device with that capability. 

Constants can be shown as entered, in canonical 
form, or in abbreviated form, to achieve different 
effects in the display. 


Composite Boxes 


Composite boxes define the components of a compos- 
ite object in a program. The primary attribute of 
a composite box is a list of component boxes. In 
addition, composite boxes contain attributes that 
define how the components should be formatted rela- 
tive to each other. 


1 : | 2 : | 3 : | if 1 4:|f(...)| 


5: 


I I 

6 : | then| 
I I 


I 7: I 8 : | do ; | 


9: | 10: |x=l; | | 


I 11: |y=2; I I 


1 12: | z=3; | 


I I 
| 13 : | end; | 
1 l 


I 14 : | else . . . 


The vertical positioning of text in a composite box 
is determined by explicit or implicit cut points . 
If a box is a potential cut point, then the text 
within that box may begin on a new line of the dis- 
play. If a box is not a potential cut point, then 
the text in that box is always concatenated to the 
text on the current line. In a vertical box, every 
immediate component is an implicit cut point. In 
addition, the components of a vertical box must all 
appear on the same line, or they must all appear on 


Figure 6. Part of the box structure for the 
statement in Figure 2 


distinct lines. In horizontal boxes there are no 
implicit cut points; any potential cut points in a 
horizontal box must be explicitly declared during 
box generation. In a block box, every component is 
also an implicit cut point, but several components 
may appear on a line. 


• 

« 

1 

Box 

Kind 

Components 

Other attributes 


1 

Horizontal 

2, 5, 14 

indent=5 


2 

Horizontal 

3, 4 

required expansion 


3 

Atomic 


required expansion 





keyword 


4 

Block 




5 

Horizontal 

6, 7 

cut point 


6 

Atomic 


required expansion 





keyword 


7 

Horizontal 

8, 9, 13 

indent=4 


8 

Block 


required expansion 





no separating spaces 


9 

Vertical 

10, 11, 12 

conditional cut point 

c 




merge inner ellipses 





separate parts with space 


10, 11, 12 

As required 


. 

13 

Block 


required expansion 

Q 




of fset=-2 


14 

Similar to 

5. 



Figure 7. 

Details of the boxes in 

Figure 6 


Ill 






A composite box must specify the indentation of the 
components relative to the origin of the box. We 
also want to specify whether the components must be 
separated by a space if they appear on the same 
line, and whether adjacent elided components can be 
merged into a single ellipsis. If the standard 
priority allocation is not suitable to the compo- 
nents of a box we can specify the relative 
priorities of the components. 


Relation to Nearby Boxes 


Each box has attributes that specify the relation- 
ship of that box to its display environment. In 
addition to the implicit spacing of components, we 
may want to specify the leading and trailing spaces 
required for a box. This parameter is applied only 
if the box is adjacent to another on the same line. 
We can also specify an offset that shifts the ori- 
gin of the box relative to the current left margin. 
Since this number may be positive or negative, it 
can be used to shift text to the left where it 
stands out in a sequence of indented line. 

Each box can have an explicit cut point 
declaration. High priority cut points are allo- 
cated when encountered; low priority cut points are 
allocated only if the line containing them over- 
flows the display area. Conditional cut points are 
allocated only if the origin of the box overlaps 
text on the current line. 


Miscellaneous Flags and Parameters 


A box can be flagged as a potential focus or a 
potential frame. A potential focus is a box that 
contains a phrase of the abstract syntax. A poten- 
tial frame is a box that may be the outermost box 
that shows in a display window. In a PL/1 
DO-group, for example, the group may be formatted 
as a box of three components: the header, the body, 
and the ending statement. All three components may 
be selected as a focus, but it may be desirable to 
specify that the body box is not suitable as a 
frame . 

A box can be marked as requiring expansion. In 
this case whenever an attempt is made to show the 
box at all, an attempt must be made to show the com- 
ponents instead. This feature prevents important 
connective keywords or delimiters from being hidden 
in ellipses . 

Display features such as brightening, font or color 
can be invoked by specifying a static or dynamic 
highlight class. The static highlight class of a 
box is determined during box generation, while the 
dynamic highlight class changes during a session. 
The distinction between keywords and identifiers is 
an example of static highlight class. The focus is 
normally identified in every display by marking the 
appropriate boxes with a dynamic highlight class. 

Static and dynamic highlight class are used to 
index into a table of effective highlight classes 
which are in turn mapped into an effect on the dis- 


play. On a display device with several colors, 
several effective highlight classes can be dis- 
played simultaneously. On a display device with 
only one highlight possibility, such as 
brightening, only one effective highlight class can 
be shown at one time, and the meaning of the dis- 
played highlight must be identified in an 
accompanying legend. But the box structures are 
identical for the two devices . 


Special Purpose Text 


Comments in most programming languages require spe- 
cial handling. Since comments are normally removed 
from the text by the lexical scanner, they do not 
even appear in the parse tree. It is also desira- 
ble to show a program with or without comments, or 
to replace sections of code by an associated com- 
ment as a form of user controlled ellipsis. 

A programming environment generates a large amount 
of descriptive information about the program being 
edited. A natural way of displaying this informa- 
tion is in the form of footnotes or asides in the 
display of the program. 

Since the box structure does not have to be identi- 
cal to the parse tree, we can include comments and 
annotations in the places that are most natural for 
them. By identifying comments and annotations in 
the box structure, we can generate displays with or 
without them by simply changing a parameter to the 
display allocation algorithm. 


THE DISPLAY ALLOCATION ALGORITHM 


The goal of the display allocation algorithm is to 
select a subtree, W, from the box structure, B, 
that represents the entire program. The root of W 
need not be the root of the program, and the leaves 
of W need not be the leaves of the program. Where 
the leaves of W coincide with leaves of B, the 
display shows the full text of the program; where 
the leaves of W are a sub-tree of B, the display 
shows an ellipsis. 

The subtree W is generated as a sequence W(0), 
W(l), ... of subtrees where each W(i) is a proper 
subtree of W(i+1). Given that the display of W(i) 
fits in the available screen area, the potential 
successors of W(i) are defined either by an ascent 
step or by one of several expansion steps . An 
ascent step includes W(i) in a larger subtree; an 
expansion step extends a leaf of W(i) to include 
more of the detailed structure of B. W(i+1) is 
the first potential successor of W(i) that fits 
on the screen; if there are no potential 
successors, or none of them fit, then W is W(i). 
The ordering of the potential successors is deter- 
mined by relative priorities assigned to the ascent 
step and to the possible expansion steps. 

A possible implementation of the allocation algo- 
rithm could be to start at F and assign priorities 
to all the boxes in B. We could then examine the 
boxes in B in order of decreasing priority and 


112 



perform ascent or expansion steps as necessary 

until W was found. This implementation is slow 
because it must repeatedly scan portions of B 
that will never appear on the screen. 

First, we describe a faster algorithm that combines 
the assignment of priorities with ascent and expan- 
sion steps and that implements this process by a 
marking process on B. The potential successors and 
their priorities are generated by each ascent or 
expansions step. We then describe refinements that 
increase the efficiency of the process, and that 
improve the appearance of the resulting display. 
We will describe the process for edit mode and then 
indicate how it is modified to implement multiple 
focus mode and reading mode. 


allocate: procedure; 
top: If empty_queue (P) then return; 
b := f irst_in_queue(P) ; 

R, S, T := erapty_list; 

c := c+1; 

If is_ascent_mark(b) 

then call ascend; 
else call expand(b); 
call measure; 

If -*window_overf low 
then begin; 

add_list_to_queue(T, P); 
for y in R call mark(y, a); 
for y in S call mark(y, a); 
end ; 

else continue; 
go to top; 
end allocate; 


.t 


■S 


.d 

r 

n 


r 

e 


:o 

J, 

W 


-y 


-) 

il 

It 

n 


Is 

:s 

il 


it 


id 


Data Structures and Initialization 


The marking of B is done with two numbers a, and 
c. The number a marks boxes that are definitely 
part of W; the number c is used to mark boxes in 
a potential successor. If the potential successor 
fails, the boxes in it remain marked with an obso- 
lete value of c and are not scanned again, other- 
wise they are re-marked with a. The number c is 
incremented by one for each potential successor 
attempt. The numbers are initialized to x+1, whe- 
re x is the highest value of c in the preceding 
display computation. As a result, the marking per- 
formed during one display computation cannot be 
confused with markings left over from previous 
passes . 

The data objects used by the algorithm are a prior- 
ity queue P, and three lists R, S, and T. P 
contains boxes and ascent marks ordered by 
priority. An ascent mark is a dummy box that is 
used as a place holder with a specific priority. R 
is a list of boxes that start a new line in a poten- 
tial successor; S is a list of new boxes included 
in a potential successor; T is a list of boxes at 
the fringe of S. 

The initial contents of the queue P is determined 
by the current focus and the current display mode. 
In edit mode, with a single focus, we set the pri- 
ority of the focus to 1 and enter the focus in P. 
If the focus is not the root of P, we also add an 
ascent mark to P. 


The Basic Algorithm 

The display allocation process consists of three 
procedures described here in an informal program- 
ming language. The main entry to the algorithm is 
the procedure allocate. The procedure expand 
describes an expansion step, and the procedure 
ascend an ascent step. 


expand: procedure (x) ; 

call raark(x, c) ; 
call add_to_list (x, S); 

If is_atoraic_box(x) then return; 

/* Otherwise x must be composite. */ 

For each y in box_components (x) 
if has_req_expand_f lag(y) 
then call expand(y); 
else call add_to_list (y, T) ; 

return; 
end expand; 

ascend: procedure; 
top: x := box_ancestor(W) ; 
call mark(x, c) ; 

For each y in box_components (x) 
if y=W then continue; 
else if has_req_expand_f lag(y) 
then call expand(y); 
else call add_to_list (y , T) ; 

If -*has_pot_frame_f lag(x) 
then go to top; 

If has_ancestor (x) 

then call add_ascent_mark; 

Return ; 
end ascend; 

The purpose of the procedure measure is to deter- 
mine whether a potential successor fits or can be 
made to fit in the available display area. The 
measuring process is a depth first scan of B, 
starting at the root of W and including all boxes 
marked with the current values of a and c. Lines 
are allocated on the basis of allocated cut points. 

As text producing boxes are encountered in the 
scan, the effective length of the current line is 
incremented appropriately. When an allocated cut 
point is encountered, a new line is started with an 
initial indent determined by the indent parameter 
of the ancestor of the cut point and the offset 
parameter of the cut point. Cut points, like 
expansions and ascents, are marked with the numbers 
a or c. 

During this scan, we compute the length of each 
line of the display at the current level of expan- 
sion, and we build a priority queue D that con- 
tains potential cut points ordered by priority. If 
all the lines are within the limits of the display 
area, we discard D and return to the main algo- 
rithm. If there are lines that overflow the 
display area and if there are lines still 


113 



available, we remove entries from D , mark them 
with c as allocated cut points, and place them in 
' R. At this point we must repeat the measuring pro- 
cess . 


Line Allocation 


The process of line allocation interacts closely 
with the state of the display at every stage of 
expansion. If lines are allocated too slowly, 
expansion steps will fail early and the resulting 
display will not fill all the available space. If 
lines are allocated too readily, then the resulting 
display will appear thin and will not make good use 
of the available horizontal space. 

As each line is generated during the measuring 
scan, required cut points are entered in D as they 
are encountered. Optional cut points are remem- 
bered until the end of the line is reached; if the 
line fits in the available space, the optional cut 
points are ignored, otherwise, they are added to D 
with a degraded priority. If the text on a line is 
inside a block format box, the only possible cut 
point that is recognized is the first component of 
the block that begins within the available space 
and ends outside of it. 

Priority Allocation 


The priority of boxes during the expansion process 
is determined by a number that is computed for each 
box as it is reached. Increasing values of this 
number indicate progressively lower priority. In 
Edit Mode, when a composite box x with priority p 
is expanded, the priorities of the components are 
up, up+v, up+2v, ... , where u and v are con- 
stant parameters of the algorithm. When an ascent 
step is made from a box x with priority p, the 
priority of the ancestor of x is sp, the priori- 
ties of the components to the right of x are p+t, 
p+3t, ... , and the priorities of components to the 
left of x are p+2t, p+4t, ... . 

By varying the parameters s, t, u and v, we can 
vary the appearance of the display in terms of the 
amount of context or detail that shows relative to 
the focus. At the same time we retain a consistent 
appearance that favors the focus and looses detail 
as distance from the focus increases. 

The relative priority of potential cut points is 
determined by the structure of B. Thus when the 
focus is on B as a whole, the cut point priorities 
are the same as the expansion priorities. However, 
when the focus is on a subtree of B, the cut point 
priorities retain their top-down bias and the shape 
of the display in terms of vertical layout and 
indentation remains roughly constant as the focus 
shifts . 


Efficiency 

We can see from the above description that if k is 
the number of symbols showing in a final display, 
and n is the average number of components in each 


box, then in the worst case we will iterate nk 
times through the allocation algorithm. Since each 
iteration invokes a measuring step that will visit 
an average of k boxes, the total number of boxes 
visited to compute a display is nk 2 . We can reduce 
the number of times the - measuring step is invoked, 
and we can reduce the number of boxes visited dur- 
ing each measuring step. Note that in any case, 
the cost of computing a display is controlled by 
the size of the screen, not by the size or complex- 
ity of the program being displayed. 

The expansion process can be divided into two suc- 
cessive stages. The first stage lasts until all 
the available lines have been allocated. During 
this stage the measuring pass must record potential 
cut points. As soon as all the lines have been 
allocated, we enter the second stage in which the 
measuring process is much simpler and faster. 
Since most of the calls to the measuring procedure 
are in the second stage this optimization is very 
effective . 

It is not necessary to measure the entire display 
after each expansion step. If the expansion 
affects only one line, does not cause an overflow, 
and does not introduce any cut point candidates , 
then we can confirm the expansion and continue 
without re-measuring. We cannot compute the exact 
effect of the expansion from purely local informa- 
tion but we can estimate the maximum effect that 
could be produced. 

During the first stage of expansion, this is the 
best we can do, but in the second stage of expan- 
sion, we can make a much more accurate estimate. 
For each line we maintain a minimum possible length 
and a maximum length. For each expansion, we make 
a minimal and a maximal estimate of the effect of 
the expansion. If the minimal estimate causes the 
minimal length to overflow, then the expansion step 
will definitely fail in measuring, so we can dis- 
card the expansion without measuring. The maximal 
estimate works just as in the first stage. 

We can reduce the number of boxes scanned during 
the measuring process in the second stage by 
detecting boxes that have reached a stable form. 
An atomic box is stable if it was fully expanded in 
a previous step, i.e. it is marked with a; any box 
is stable if is permanently elided, i.e. it was 
abandoned in a previous step and is marked with a 
value between a and the current value of c. Fur- 
thermore, a box is stable if all its components are 
stable. During each measuring step, we can propa- 
gate summary information to the ancestors of stable 
boxes and skip that descent subsequently. 


Refinements 


The allocation algorithm as described will show 
short phrases with low priority next to ellipses 
that stand for long phrases of higher priority. 
For example, the phrase 

big_function_name(big_var_name+x) 
may show as 


114 


...(... +x) 


BOX GENERATION 



.ch 

it 

es 

ce 

d, 
r- 

e , 
oy 
x- 


il 
n 8 
al 
in 
ae 
r . 
re 

-y 


on 
v , 

ae 


it 


zh 


ap 

al 


ig 

oy 


in 

DX 

IS 

a 

re 

a- 

le 


if only 10 character positions are available for 
it. This turns out to be unesthetic. We can remedy 
this situation by removing from P any immediate 
siblings of a box that fails the expansion step. 

In order to implement the intention of vertical 
boxes, we must introduce a cleanup step at the 
transition from stage one to stage two. For each 
line that contains more than one but not all the 
components of a vertical box, we suppress the 
expansion of all but the highest priority component 
on that line. 

In order to implement the intention of block boxes, 
we must allow the text in a block to flow from one 
line to the next as inner boxes are expanded. This 
means that we must re-allocate the cut points in a 
block each time the block is measured. In order to 
do this, and still be able to discard an allocation 
simply by stepping the value of c, the cut point 
allocation in blocks must contain a memory of the 
previous allocation. 

The allocation algorithm described above can gener- 
ate a display with multiple foci if we change the 
initial value of P and introduce an additional 
stage of expansion that precedes the two previously 
mentioned stages. Let X be the set of desired 
foci. We initialize P with the elements of X 
and compute Y, the least common ancestor of the 
boxes in X. We then run the allocation algorithm 
with the following modifications. For each box b 
removed from P, if b is already expanded, we 
ignore it; otherwise we perform an ascent step. 
When W contains Y, we switch to the normal algo- 
rithm. 

In reading mode, the focus and priority strategy 
are determined by the scrolling direction desired. 
If the user scrolls towards the end of the program, 
the new focus is the first elided phrase to the 
right of the current focus. The priority strategy 
is to assign increasing numbers from the new focus 
in a depth-first, left-to-right manner. If the 
user scrolls toward the head of the program, the 
new focus is the first elided phrase to the left of 
the current focus. The priority strategy is to 
assign increasing number in a depth- first, 
right-to-left manner. 


Display Generation 

The generation of the actual display buffer is a 
special case of the measuring step. Instead of 
computing the effective size of a box, we lay down 
the text in a display buffer. This step is clearly 
device dependent. With some devices, such as the 
IBM 3277, the device dependency must be carried 
back into the measuring process also since bright- 
ening may introduce a space on the display where 
the box structure did not require one. 

In many cases however, a change in dynamic and 
effective highlight class can be effected by regen- 
erating the display from the most recent 
allocation. As a result, shifting brightening from 
one allocated phrase to another, can be done very 
quickly. 


We have described the box structure and the dis- 
play allocation process in terms of a data struc- 
ture distinct and separate from the parse tree. 
Clearly there are many points at which there is a 
one-to-one correspondence between parse tree nodes 
and boxes. Many of the differences between the 
parse tree and the box structure arise from the 
fact that all nodes in the parse tree have a bound- 
ed number of descendants, while boxes do not. 
Other differences are caused by arbitrary choices 
made in the grammar of the language. It is not 
clear whether we can design a grammar for a given 
language in such a way that it allows efficient 
parsing and at the same time reflects the intuitive 
abstract structure of the language. 

In the current implementation, the box structure is 
generated from the parse tree as a separate data 
structure by a collection of routines that embody 
the esthetic choices involved in prettyprinting as 
lines of code. A more general approach would be to 
define box generation as a collection of 
pattern-constructor pairs where the pattern match- 
es a fragment of the parse tree and the constructor 
defines a box structure. 

If the parse tree is allowed to change, as it must 
be in a program development environment, we must 
update the box structure accordingly. The simplest 
approach is to recompute the entire box structure 
whenever any change is made to the parse tree. A 
much more effective solution is to propagate change 
markers from the point in the parse tree where a 
change is made to the root of the parse tree. Then, 
during the box generation step if all the nodes 
involved in a prettyprinting decision are unmarked 
nodes, we can use the previously generated box 
structure for that portion of the parse tree. This 
method is very natural if user updates to the pro- 
gram are incorporated into the existing parse tree 
by an incremental parser [7] . 

Syntax errors, comments and annotations cause simi- 
lar problems during box generation. Since they can 
occur at any point in the program, it is impracti- 
cal to account for them in the patterns that drive 
the prettyprinter . If we add them by updating the 
box structure generated from the parse tree, then 
we are updating a structure generated under the 
assumption that it specifies a complete format. 
The current implementation appends all comments and 
errors to the atomic box that immediately precedes 
them. This has often unfortunate effects on the 
format of the display. We hope to achieve a more 
pleasant format by treating unparsed text in a man- 
ner similar to the parsed text around it. Thus a 
comment between two statements would be treated 
like a statement. 


CONCLUSION 


The algorithms described in this paper have been 
implemented in Lisp, for a prototype of a program 
ming environment that supports an extended subset 
of PL/1. We are currently re- implementing the pro- 
totype in its target language. The efficiency of 


115 




the Lisp implementation is adequate for demon- 
stration purposes. If a user interaction does not 
cause a display allocation pass, such a shifting 
the focus to a phrase on the current screen, the 
display can be updated in about 50ms. of cpu time. 
If the display allocation procedure is invoked, the 
required cpu time is from 300 to 500 milliseconds. 
In the first case, the response is instantaneous in 
our time-sharing environment; in the second case 
there is a definite delay in system response. By 
using more compact and efficient data structures, 
we expect to speed up the response of the new 
implementation by a factor of 5. This should pro- 
vide instantaneous response in all cases and allow 
the system to be used as a serious programming 
tool . 

This method of generating displays is not limited 
to programs. We have used it for data structures 
and in a very limited way for text. Clearly, any 
structured object that cannot be shown fully on a 
single screen can be condensed with appropriate 
ellipses to make each view of it more readable. 

In conclusion, our experience shows that this meth- 
od of generating displays of programs works, and is 
adequately efficient. An important effect of this 
approach is that it emphasizes the point of view 
that a program is an abstract object that is 
defined by the structure of its components, in con- 
trast to the view that the program is a slab of text 
from which the structure is inferred every time it 
is read. 


ACKNOWLEDGMENTS 


The idea of program text expanding to fill an 
available window originated in early discussions of 
program editors with Gerry Howe and Vincent 
Kruskal. I would also like to thank my colleagues 
Cyril Alberga, Steve Fortune, Paul Kos insky, George 
Leeman and Mark Wegman for many arguments and crit- 
icisms that helped form the ideas in this paper. 


5. Teitelbaum, R. T. , The Cornell Program Synthes- 
izer: A Microcomputer Implementation of PL/CS, 
Department of Computer Science, Cornell Uni- 
versity, TR 79-370, June 1979. 

6. Teitelman, W. , INTERLISP Reference Manual, 
Xerox Palo Alto Research Center, 1976. 

7. Wegman, M. N. , Parsing for Structural Editors, 
Twenty-first Annual IEEE Symposium on Founda- 
tions of Computer Science, October 1980, 
320-327. 


REFERENCES 


1. Alberga, C. N., Brown, A. L. , Leeman, G. B. 
Jr., Mikelsons, M. , and Wegman, M. N., A Pro- 
gram Development Tool, Eighth Annual ACM Sympo- 
sium on Principles of Programming Languages, 
January 1981, 92-104. 

2. Donzeau-Gouge , V., Huet, G., Kahn, G. , Lang, B. 
and Levy, J. J. , A Structure-oriented Program 
Editor: A First Step Towards Computer Assisted 
Programming, International Computing Symposium 
1975, North Holland Publishing Company, 1975. 

3. Hawkinson, L. , and Kameny, S. L. , LISP Edit 
Program, LISPED, System Development Corpo- 
ration, TM-2337/ 100/01 , April 1966. 

4. Swinehart, D. C., COPILOT: A Multiple Process 
Approach to Interactive Programming Systems, 
Stanford Artificial Intelligence Laboratory 
Memo AIM-230, Stanford University, July 1974. 


116 


OM THE LINE BREAKING PROBLEM IN TEXT FORMATTING 


James 0. Acnugbue 

Department of Mathematical and Computer Sciences 
Michigan Technological University, Houghton MI 


ABSTRACT 

A basic problem in text formatting is that of 
determining the break points for separating a 
string of words into lines to obtain a formatted 
paragraph. Wnen formatted text is required to be 
aligned with both tne left and right margins, the 
choice of oreak points greatly affects the quality 
of the formatted document. This paper presents and 
discusses solutions to the line breaking problem. 
Tnese include the usual line-by-line method, a 
dynamic programming approach, and a new algorithm 
which is optimal and runs almost as fast as the 
line-by-line method. 


KEXW0RD3 

Text formatting, line breaking, text 
alignment, computer typesetting, dynamic 
programming. 


1. INTRODUCTION 

In this paper, we are concerned primarily with 
tne design of algorithms to format the text of a 
paragraph for printing on an output device wj.th 
fixed character positions, such as a line-printer. 
Tnis is to be contrasted with printing on output 
devices with arbitrary character positions, such as 
graphic terminals and typesetting equipment. The 
former is usually referred to as text formatting 
while the term typesetting is applied to the latter 
type. Typesetting is the main concern of systems 
such as [3,4,5,6]. While tnis unquestionably 
yields documents of professional quality, the 
equipment required is nevertheless unavailable to 
tne majority of potential users. Thus, any features 


Permission to copy without fee all or part of thts ^tenal s gramed 
provided that the copies are not made or d.stnbuted for direct 
commercial advantage, the ACM copyrtght nonce and her.tle of the 
publication and its date appear, and nonce ,s giver t that copying* by 
permission of the Association for Computmg Machmery. To copy 
to rpnnhlish. reauires a fee and/or specific permission. 


® 1981 ACM 0 - 89791 - 043 - 5 / 81 / 0600/0117 $ 00.75 


which aim at improving the quality of text- 
formatting are most welcome. A useful feature, 
often supplied by designers of text formatting 
software, for example [7,3,9], is the ability to 
align or justify the formatted text with both left 
and right margins. Text alignment poses two main 
problems. First, the determination of the break 
points for separating the words of a paragraph into 
lines, and second, the distribution of tne surplus 
spaces on each line in between the words of that 
line. 


The first problem is usually solved by filling 
up each line as much as possible and then 
proceeding to the next. This method will be 
reviewed in the next section and used as a basis 
for the subsequent development. Wnile it is simple 
to implement and works well in many situations, it 
does not always produce the best results when text 
is being justified. 

A common strategy for dealing with the second 
problem is to distribute the surplus spaces for 
each line between the words starting alternately 
from the left and right margins. Thus, if line one 
has additional spaces, then the spaces between the 
first and second words, the second and third words 
and so on will be increased by one until the 
surplus spaces are used up. If line two also has 
surplus spaces, it is now used up by increasing 
spaces between the last and the last but one words, 
tne last but one and the preceding word, and so on. 
The aim of this strategy is to avoid "rivers" of 
white space running down the length of the page. 
Another possibility would be to assign the extra 
spaces to the inter-word gaps in a pseudo-random 
manner. However, irrespective of the surplus space 
distribution strategy, a poor line breaking 
algorithm will undoubtedly frequently produce 
poorly formatted text. This paper therefore 
focusses attention on the line breaking problem. 

Another important factor which should be taken 
into consideration in connection with this problem 
is that of hyphenation. No doubt, a good 
hyphenation algorithm will help improve the quality 
of formatted text. However, hyphenation in itself 
is a complex problem and will be left out for most 
of this presentation. 

In sections 2 to 4 the algorithms are 
presented for formatting. Section 5 discusses 
extensions to typesetting and hyphenation. 


117 



2. THE LINE-BI-LINE METHOD 


It is assumed that we have as input a 
paragraph consisting of a sequence of N>0 words. 
Here, a word is simply a string of non-blank 
characters. The number of characters in each word 
is given in the array W[I], 1<=I<=N. The paragraph 
will be formatted into M lines of D characters 
each. Tne line breaking problem is solved by 
specifying for the J-th line, 1<=J<=M, the index 
S[J] of tne first word of the line. (Tne major 
variables used by all the algorithms are summarized 
in table 1 . ) 


N 


M 


D 

W[I] 

S[I] 


L[I] 


E[I] 

F[I,J] 

C[I,J] 

e[I] 


number of words in paragraph 
number of formatted lines 
maximum number of characters per line 
number of characters in the I-th word 
index of first word in I-th line, 

tnat is, I-th line 3tarts with W[S[I]]. 
lengtn of I-th formatted line before 
distribution of surplus spaces, 
index of first word, line I, for earliest 
breaking 

formatted length from I-th to J-th word 
cost function, dynamic programming 
cost function, line-breaker, = C[I,N] 


TABLE 1 : 

Major variables referenced by the algorithms. 


The line-by-line method is the one that 
immediately comes to mind and has been used in many 
text formatting programs. It is strongly appealing 
in its simplicity. The computation of the break 
points or equivalently the indices for words at the 
beginning of each line i3 given in algorithm 
LINE-BI-LINE. Note that arrays L, 5 and W need not 
be saved in actual implementations unless they are 
required for some other purposes. They are kept in 
this presentation to facilitate the discussion in 
subsequent sections. Clearly, then, the 
line-oy-line method can be implemented so that it 
has 0(N) worst case time complexity and requires 
storage mainly for the one line of output. (The 
algorithms are also given in an PASCAL-like fashion 
in the appendix) . 


ALGORITHM LINE-BI-LINE 


/D,L,M,N,S,W are as given in Table 1/ 

(1) /initialize/ 

M <- 1, 3[1] <- 1, L[M] <- W[ 1 ] 

I <- 2 

(2) /add word to line/ 

L[M] <- L[M] + 1 + W[I] 

if L[M] <= D then goto (4) 

(3) /start new line/ 

L[M] <- L[M] - 1 - W[I] 

M <- M + 1, S[M] <- I, L[M] <- W[I] 

(4) /test for completion/ 

I <- I + 1 , if I <= N tnen goto (2) 


The effect of algorithm LINE-BI-LINE on a 
short sample paragraph from [6], formatted 47 
characters to a line, is given below. The surplus 



spaces in this paragraph have been distributed 
according to the alternate left and right fashion. 
Note that of the seven line3 in the paragraph, the 
first five have 1, 2, 6, 10, 7 surplus spaces 
respectively. This means that in the fourth line, 

10 spaces have to be distributed between six words, 
resulting in triple spacing between some of them. 
Pathological cases similar to this and worse abound 
in the literature. It would seem that this 
formatting can be improved by transfering the last 
words of the first, second, third and fourth lines 
to the beginning of the second, third, fourth and 
fifth lines respectively. This has been done in 
Sample Paragraph t 2 where the surplus spaces are 
obviously more evenly distributed among the lines. 
In fact, the first paragraph has eight occurences 
of triple spacing compared to two in the second. 


Sample Paragraph <f\: 

We live in a print-oriented society. Every day 
we produce a huge volume of printed material, 
ranging from handbills to heavy reference 
books. Despite the mushroom growth of 
electronic media, print remains the mo 3 t 
versatile and most widely used medium for mass 
communication. 

Sample Paragraph #2: 

We live in a print-oriented society. Every 
day we produce a huge volume of printed 
material, ranging from handbills to heavy 
reference book3. Despite the mushroom growth 
of electronic media, print remains the most 
versatile and most widely U3ed medium for mass 
communication. 


3. A DINAMIC PROGRAMMING SOLUTION 

The improvements to the sample paragraph 
indicated in the preceding section demonstrate that 
line breaking as done by the line-by-line method 
does not always produce the best results. In this 
section, a dynamic programming solution for optimal 
line breaking is presented. This idea is not new. 
Knuth [5] indicates that he uses such an approach 
for line breaking in his typesetting system. 

The key to the improvements in the preceding 
section arise from the fact that when a sequence of 
words has to be broken into two or more lines it 
should be broken in such a way that the lines are 
equally used up or very nearly so. The idea is to 
eliminate extreme variations in the amount of 
surplus space to be distributed among the lines. In 
other words, justified text will look better if the 
unjustified version has minimum raggedness. 

A second important idea to keep in mind is 
that if a sequence of words will fit on one line, 
then there is no point in, and no attempt should be 
made at, splitting it into several lines. 

These considerations lead to the following 
definitions. First, the formatted length F[I,J] of 
words I to J is defined as the width that the words 
will occupy. Thus 


•118 



n. 


ne 


ds , 
jnd 

3t 
B 3 

d 


y 


f 

t 

3 


’/ 

d 


y 

.1 


Hat 

is 

nal 

h 

■S 

of 


In 

the 


be 


of 

rds 


F[I , J] = W[I] + 1 + W[I+1] + 1 + ... + H[J]. 

Second, tne following cost function is suggested 
for minimization. 

r 2 if F[I , J] <= D 4 J = N 

C[I,J] = < 1+1/F[ I , J] if F[I,J] <= D 4 J < N 

) 1+min(C[I,K] • C[K+1,J], X <= K < J), 

^ otherwise. 

The cost function, C[I,J], will be the cost of 
formatting words I to J. It recognizes the fact 
tnat tne last line of a paragraph need not be (and 
is not normally) aligned with the right margin. 

This case i3 recognized by the condition J = N. It 
also attempts no splitting of a sequence of words 
that will fit on one line. Such a sequence simply 
contributes a factor of 1 + 1/FCI,.)] to the cost of 
tne paragraph. When a split has to be made, 
however, tne break point is chosen from all 
possible candidates so as to minimize the overall 
cost. 

The discussion following presentation of the 
line-oy-line algorithm suggests the foi l owing 
definition of optimally formatted text which we 
snail adopt. A paragraph W[1]...W[N] is optimally 
formatted if it is broken into the fewest number of 
lines and the surplus spaces on each line, not 
counting tne last line, are as close together as 
possible. 

We argue that minimizing C[1,N] will result in 
an optimally formatted paragraph. First note that 
if a line is split into two the cost function will 
increase. Hence, a paragraph with minimum C[1,N] 
will have the fewest number of lines. Secondly, we 
show tnat the function is minimized by having equal 
length lines (not counting the last one). Let the 
line lengtns for the first m-1 lines be x[1], x[2], 
... x[m-1 ] . The final cost is twice the product of 
(1+1/x[i]) , 1<=i<=m-1. Now, given that a+b is 
constant it is straightforward to prove that 
( 1+1/a) (1+1/b) is minimal when a=b. In the general 
case when m>3, assume the lengths x[i], 1<=i<=m-1 
are optimal and x[j] is not equal to x[j+1] for 
some j. Tnen we can lower the overall cost by 
keeping the other x[i] and replacing x[j], x[j+1] 
by (x[j]+x[j+1])/2. This contradicts the fact that 
we nad the minimum cost. 

In order to compute the optimal breaking 
indices, note tnat if W[I]..W[J] has to be broken 
into W[I]...W[K] and W[K+1 ] . . . W[ J] , then both 
subsequences W[I]...W[K] and W[K+1 ] . . . W[ J] must 
each be optimally formatted this time taking into 
consideration the last line of any subsequence 
wnich is not the last line of the paragraph. The 
computation of optimum cost C[1,N] is given in 
algorithm DYNAMIC. It is similar to many other 
dynamic programming algorithms, [1,2] for example, 
and the modifications required to keep track of the 
breaking indices is a straightforward exercise. 

It is also straightforward to determine that 
algorithm DYNAMIC takes 0(N**2) space and 0(N*»3) 
time. Tne algorithm is thus too costly for regular 
use and one would rather put up with the poorer 
results of line-by-line processing. However, by 
combining features of this algorithm and the 
line-oy-line one, a much faster optimal solution 
can be devised. 


As expected, application of the dynamic 
programming approach to the sample paragraph yields 
the much improved version given in Sample Paragraph 
# 2 . 


ALGORITHM DYNAMIC 


/computation of C[1,N]/ 

/Only upper diagonal of C computed/ 

/C, D, F, N and W are explained in table 1/ 

(1) /initialize/ 

C[I,J] <- F[ I , J] <- 0, 1 <=I<=N , 1 <= J<=N . 

F[I,I] <- W[I], C[I,I] <- 1 + 1/W[I], 1 <=I<=N. 

I <- N - 1 

(2) /loop on rows from last to first/ 

J <- I + 1 

(3) /loop on columns from 1+1 to N/ 

/calculate length/ 

F[I,J] <- F[I , J-1 ] + 1 + W[J] 
if F[I,J] > D then goto (5) 

( 4 ) /words I to J fit on one line/ 
if J = N then C[I,J] <- 2 

else C[I,J] <- 1 + 1/F[ I , J] . 

goto (6) 

(5) /split words I to J/ 

C[I,J] <- minimum{C[ I ,K] * C[K+1,J], I<=K<J] 

(6) /end loop on columns/ 

J <- J + 1 , if J <= N then goto (3) 

(7) /end loop on rows/ 

I <- I - 1, if I > 0 then goto (2) 


4 . THE LINE BREAKER 

We begin this section by first proving a 
simple but important property of the line-by-line 
algorithm. Let PCI] , 1<=I<=M be the indices for 
optimal starting words in a paragraph. Then, 
P[I]<=S[I] , 1(sI<:H. (Recall tnat S[I] are 
determined by algorithm line-by-line.) This 
property is easily shown by contradiction as 
follows. Let J be the smallest index for which 
P[J] > St J] . Clearly, P[ 1 ]=S[ 1 ]= 1 . It is also 
trivial tnat P[2]<=S[2]. Thus, J>2. Now, words 
P[J-1],...,(P[J]-1) must fit on one line. Since 
P[J-1]<=S[J-1] and PC J]>S[ J] , it follows that words 

S[J-1] S[ J] also fit on one line. This is not 

possible, since by the line-by-line method, word 
S[ j] could not be added to tne (J-D-th line. 

Hence, no J exists for which PC J]>S[ J] . 

A direct consequence of the above property is 
that algorithm line-oy-line formats the paragraph 
into the minimum number of lines possible, thus 
satisfying Dart of the condi.tions for optimal 
formatting. Intuitively, this is to be expected 
from tne greedy approach. More importantly, this 
means tnat the improvements obtainable by using 
algorithm dynamic are solely by breaking some lines 
earlier, without further cnanges in the actual 
number of lines. Thus, in computing C[1,N], we may 
seek the first optimal break point between words 
S[1] and S[2] . Another way of looking at it is 
that one should seek for the position to break off 
one line at a time. Using this approach, tne 
computation of C[ 1 , N] can be speeded up to 0(N»»2) 
time. However, there are further improvements to be 
had and we shall proceed a little differently. 


119 




Breaking witn the set of indices SCI] S[M] 

provides latest breaking points for formatting in 
tne minimum number of lines, M. Now, consider 
breaking with the earliest breaking indices, E[I], 
tnat result in M formatted lines. If P[I] gives the 
indices for an optimal set of break points, then 
P[M]=S[M] since clearly there i3 no gain in pushing 
any words from the (M-l)-th line to the M-th line. 
It is also clear that the optimal breaking index 
P[I] lies between E[I] and S[I] and since P[M]=S[M] 
one might as well choose E[M] =S[M] . The remaining 

indices E[1] E[M-1] are then the earliest 

breaking indices for formatting words 1 to (S[M]-1) 
into M-1 lines and are computed by performing the 
line-by-line algoritnm in reverse, that i3, scan- 
ning the words in reverse order and filling up the 
last line and then the la3t but one line and so on. 

Tnus, the optimal set of breaking indices P [ I ] 
satisfy the condition E[I]<=P[I]<=S[I] where 
E[ 1 ]=P[ 1 ]=S[ 1 ]= 1 and E[M]=P[M]=S[M] . We have 
therefore significantly reduced the search regions 
for the optimal starting indices P[I]. The improved 
computation of an optimal set of starting indices 
is given in algorithm LINE-BREAKER. It is assumed 
tnat the latest breaking points S[I] and associated 
line lengths L[I] have been computed in the 
line-by-line fashion, and the earliest indices E[I] 
by tne reverse process as explained above. The 
algorithm uses the same cost function as for 
DYNAMIC but computes only a small number of the 
matrix entries. The order of computation is 
C[J,N], J=E[M-1]...S[M-1], then C[J,N], 

J=E[M-2] — S[M-2] and so on until C[1,N]. Since the 
second index i3 always the same (only elements from 
the last column are computed) only the the second 
index J is used, giving the cost c[J]=C[J,N] . Note 
that in computing the cost c[J] the search 13 done 
for the next breaking index (breaking off one line 
at a time) which is known to lie in the range 
E(I+1 ] . . ,S[I+1 ] . Note further that it has also 
been arranged to compute the length function only 
where necessary. 

The correctness of algorithm line-breaker can 
be surmised from the preceding discussion. The 
algorithm clearly uses 0(N) space principally for 
the required linear arrays. 

Define the slack on the I-th line to be 
T[I]=S[I]-E[I] . In the LINE-BREAKER algorithm, the 
loop on I is performed for M-1 iterations and 
within this loop, the loop on J is performed for 
T[I] iterations and within the latter there is yet 
anotner loop on K which is performed for T[I+1] 
iterations. These loop operations thus require 
0(T[1]T[2] + T[2]T[ 3] + ... + T[M-1 ]T[M] ) time. 

One may use as a very loose upper bound the maximum 
slack, v, and bound the above expression with 
Mv**2. Since the computation of latest and 
earliest breaking indices and the initial line 
lengths, L[I], as well a3 the final recovery of the 
optimal set of breaking indices are each linear in 
N, we obtain a total running time of 0(N+M(V**2) ) . 

It should be stressed that the search for 
optimal breaking has thus been reduced to searching 
among those words which can possibly be moved from 
their initially assigned line3 to the next one 
without increasing the number of lines, that is, 
the words which constitute the slack on the I-th 
line. Full advantage Is taken of the work done by 


the 3imple line-by-line procedure. In practice, 
the slack per line is a very small number so that 
the algorithm is almost linear in behavior. 


ALGORITHM LINE-BREAKER 


/fast computation of optimal breaking indices 
PC 1 ] , P[2] , . . . ,P[M] , M > 2. 

Assume 5(1] - latest breaking indices 

E[I] - earliest breaking indices, and 
L[I] - lengths of lines from the 
line-by-line approach 
have been computed. 

INFINITE - any number larger than maximum 
possible c[I]./ 

(0) /initialize/ 

1 <- M - 1, c[S[M] ] <- 2 

(1) /loop on lines, backwards/ 

X <- L [ I ] - 1 - W[S[I]] 

J <- S[I] 

(2) /loop over I-th slack/ 

X <- X + 1 + W[J], Y <- X + 1 + W[S[I+1]] 
c[J] <- INFINITE 
K <- SCI+1] 

(3) /loop over (I+1)-th slack/ 

Y <- Y - 1 - W[K] , if Y > D then goto (5) 

2 <- (1 + 1/Y) * c[K] 

if Z >= c[J] then goto (5) 

(11) /update c[j]/ 

oCJ] <- z, P[J] <- K 

(5) /end loop over (I+1)-th slack/ 

K <- K - 1, if K >= E[I+1 ] then goto (3) 

(6) /end loop over I-th slack/ 

J <- J - 1, if J >= E[I] then goto (2) 

(7) /end loop on lines/ 

I <— I — 1 , if I > 0 then goto (1) 

(8) /retrieve optimal breaking indices/ 

J <- P[1], PCD <- 1, P[M] <- S[M] , I <- 2 

(9) /retrieve loop/ 

K <- PCI], PCI] <- J, J <- K 
I <- I + 1 , if I < N then goto (9) 


5. EXTENSIONS 

Although this investigation was originally 
prompted by the need for improved formatting, the 
resulting algorithm can be extended in a 
straightforward manner to deal with variable pitch 
font3 for typesetting. This requires redefining 
W[i] to be the width of word i rather than the 
number of characters in it. Similarly, redefine D 
to be the line width and replace "1" for space 
widths by the minimum allowed width for a space in 
a justified line. 

Algorithm LINE-BREAKER could also be extended 
for use with an automatic hyphenation process in a 
manner that would minimize the words considered for 
breaking. Assuming the hyphenation points, if any, 
within each word are given, one possibility would 
be to ignore these points in the initial 
computation of latest and earliest breaking 
indices, and take them into consideration in the 
final stage when attempting to equalize the 
formatted lengths of the lines. This approach will 
not always produce the absolute minimum number of 
lines but as a trade-off fewer words will be 
considered for breaking. 


120 


6. CONCLUSIONS 


J 


D 

in 


Three algorithms for the line breaking problem 
have been discussed in this paper. The line-by-line 
method is simple but often produces undesirable 
results. The dynamic programming approach 
generally re3ult3 in co3tly computations. By 
combining features of both methods, a new hybrid 
algorithm, LINE-BREAKER, is produced, which 
guarantees optimal results for the type of printing 
environments under consideration, uses much less 
space than dynamic programming and is almost a3 
fast as the basic line-by-line processing. 
Line-breaker is sufficiently fast for regular U3e 
when text is to be justified. 

These algorithms have so far been tested on a 
stand alone basis. It is expected that the 
LINE-BREAKER approach will be incorporated in 
future text formatting systems. 


ACKNOWLEDGEMENTS 

The author is grateful to the referees for 
their very helpful suggestions and to his 
colleagues Karl 0tten3tein and John Lowther for 
proof-reading several versions of the paper. 


REFERENCES. 


[1] A. V. Aho, J. E. Hopcroft & J. D. Ullman, 

Tne Design and Analysis of Computer Algorithms, 
Addison Wesley, 1 974 , p59. 

[2] K. Q. Brown, 

Dynamic Programming in Computer Science, 

Tech. Rep. CMU-C3-79-106, Dept of Computer Sc., 
Carnegie-Mellon Univ., Pittsburgh Pa, 1979. 

[3] B. W. Kernighan, M. E. Leak, J. F. Ossanna, 

Unix Time Sharing System: Document Preparation, 
Bell System Technical Journal 57(6), 

July-Aug 1978, pp21 15-2134. 

[4] B. W. Kernighan, L. L. Cherry, 

A System for Typesetting Mathematics, 

CACM 13, March 1975, pp151-157. 

[5] D. E. Knuth, 

TAU EPSILON CHI, A System for Technical Text, 
American Mathematical Society, Providence, 

Rhode Island, 1979. 


[9] DOC Processor Bulletin, 

Academic Computing Services, Michigan 
Technological University, Houghton, MI. 


APPENDIX 


PROCEDURE LINE-BI-LINE; 

(* D,L,M,N ,S,W ARE EXPLAINED IN TABLE 1 *) 
BEGIN 

(* INITIALIZE ») 

M := 1; 

S[l] := 1; 

L[M] := W[ 1 ] ; 

FOR I := 2 TO N DO 
BEGIN 

(« ADD NEXT WORD TO CURRENT LINE •) 
L[M] := L[M] + 1 + W[I] ; 

IF (L[M] > D) THEN 
BEGIN 

L[M] := L[M] - 1 - W[I]j 

(» START NEW LINE *) 

M M + 1 ; 

S[M] := I; 

L[M] := W[I] 

END 

END 

END; 


Jed 
l a 
for 
iny, 
:d 


ill 

>f 


[6] J. Sachs, 

Economical Typesetting from Small Computer Text 
Files, Proc of the Third Symposium on Small 
Computers, sponsored by ACM 3IGSMALL 4 SIGPC, 
Sept 1980, Palo Alto, California, ppl84-188. 

[7] J. Pearkins, 

FMT Rference Manual, University of Alberta 
Computing Services, April 1976. 

[8] TXTFORM - A Text Formatter, 

Dept of Computer Science, Purdue University, 
West Lafayette, Ind. , 1979. 


121 


PROCEDURE) DYNAMIC; 


PROCEDURE LINE-BREAKER; 


(» COMPUTATION OF OPTIMAL COST C[1,N] 

C[I,J], F[I,U] EXPLAINED IN TABLE 1. *) 

BEGIN 

(» INITIALIZE VARIABLES *) 

FOR I := 1 TO N DO 
BEGIN 

FOR J := 1 TO N DO 
BEGIN 

F[I,U] := 0; 

C[I,J] := 0 
END; 

F[I,I] := W[I] ; 

C[I,I] := 1 + 1 / W[I] 

END 

(* COMPUTE UPPER DIAGONAL OF L AND C 
IN REVERSE ROW ORDER *) 

FOR I := N-1 DOWNTO 1 DO 
BEGIN 

FOR U := 1+1 TO N DO 
BEGIN 

(* CALCULATE FORMATTED LENGTH *) 

F[I,J] := F[ I , J-1 ] + W[J] + 1; 

IF F[I , J] <= D THEN 
BEGIN 

(» WORDS I TO J FIT ON LINE ») 

IF J = N THEN C[I,J] := 2 

ELSE C[I,JM := 1 + 1 / F[I,J] 
END 
ELSE 
BEGIN 

(* WORDS I TO U HAVE TO BE SPLIT *) 
C[I,J] := C[I,I] » C[I+1 , J] ; 

FOR K := 1+1 TO U-1 DO 
BEGIN 

T := C[I,K] » CCK+1 , J] ; 

IF T < C[I,J] THEN C[I,J] := T 

END 

END 

END 

END 

END; 


(» COMPUTATION OF OPTIMAL STARTING INDICES PCI] 
FOR M > 2. 

ASSUME SCI], £[I] , L[I] (DEFINED IN TABLE 1) 
HAVE BEEN COMPUTED. X,Y,Z ARE USED TO KEEP 
TRACK OF REQUIRED LENGTHS. 

INFINITE IS ANY NUMBER LARGER THAN MAXIMUM 
POSSIBLE COST c[I] *) 

BEGIN 

c[S[M] ] := 2.0; 

(* LOOP ON LINES BACKWARDS ») 

FOR I : = M- 1 DOWNTO 1 DO 
BEGIN 

X := L[I] - 1 - W[S[I] ] ; 

(* LOOP OVER I-TH SLACK «) 

FOR U := SCI] DOWNTO E[I] DO 
BEGIN 

X := X + 1 + WC J] ] ; 

Y := X + 1 + WCSCl+1]]; 
o[U] := INFINITE; 

(* LOOP OVER (I+D-TH LACK *) 

FOR K := SC 1+1] DOWNTO EC 1+1] DO 
BEGIN 

Y := Y - 1 - WCK] ; 

IF Y <= D THEN 
BEGIN 

(* UPDATE cC J] *) 

Z := (1 + 1 / Y) » cCK] ; 

IF Z < c[J] THEN 
BEGIN 

oCJ] := Z; 

PCJ] := K 

END 

END 

END 

END 

END; 

(* RETRIEVE OPTIMAL STARTING INDICES ») 

PCM] := S[M] ; U := P[1]; P[1] := 1; 

FOR I := 2 TO M-1 DO 
BEGIN 

K := PCI]; PCI] := J; U := K 

END 

END; 


122 


A Redisplay Algorithm 


James Gosling 
Camegie-Mellon University 


Abstract 

This paper presents an algorithm for updating the 
image displayed on a conventional video terminal. It 
assumes that the terminal is capable of doing the usual 
insert/delete line and insert/delete character operations. 
It takes as input a description of the image currentjy on the 
screen and a description of the new image desired and 
produces a series of operations to do the desired 
transformation in a near-optimal manner. The algorithm is 
interesting because it applies results from the theoretical 
string-to-string correction problem (a generalization of the 
problem of finding a longest common subsequence), to a 
problem that is usually approached with crude ad-hoc 
techniques. 


1. Introduction 

Redisplay algorithms are an important part of many 
modem video editors. It is the responsibility of the 
redisplay to maintain the correspondence between the 
image on the screen and the text being edited. When a 
change is made to the text, the image on the screen must 
be updated to reflect the fact. An example of such an 
editor is Emacs [8]. 

Communication bandwidth limitations make it 
necessary to attempt to transmit to the screen only 
information about changes that have been made. The 
process is complicated when the insert/delete line and 
insert/delete character operations available on many 
commercial video terminals are used. These operations 
allow lines of text to be moved up and down on the screen 
by deleting lines or inserting blank lines; and they allow 
text to be moved left or right on a line by inserting and 
deleting individual characters. 

Redisplay algorithms can be grouped into two 
categories: those that intermix the display update with the 
data base update, and those that separate them. 

The first approach interweaves display changes with 
data base changes. Any time a change is made to the data 
base that change is reflected immediatly on the screen. 
This approach is easy to implement since there is usually a 


Permission to copy without fee all or part of this material is 
granted provided that the copies are not made or distrib- 
uted for direct commercial advantage, the ACM copyright 
notice and the title of the publication and its date appear, 
and notice is given that copying is by permission of the 
Association for Computing Machinery. To copy otherwise, 
or to republish, requires a fee and/or specific permission. 


close correspondence between changes on the screen and 
in the data base. For example if the data base is a text file 
then inserting a line into the data base should cause a line 
to be inserted on the screen. But this approach has 
disadvantages: complicated compound operations can 
cause unnecessary and confusing manipulation of the 
screen; when the data base does not correspond closely to 
the image on the screen the technique breaks down 
completely (as in a structure editor); and this interweaving 
of display and data base code can make the program 
difficult to debug and modify and is generally poor 
programming practice. An example of this approach can 
be found in [3]. 

The second approach separates display and data base 
changes. The data base is changed without considering the 
effects on the display, an update procedure is called 
periodically to analyze the new data base and update the 
display. The advantages of this approach parallel the 
disadvantages of the first: compound operations are 
handled gracefully and the separation of the data base and 
the display yield a more reliable, maintainable and clean 
program. The principle disadvantage is poor 

performance, which can nearly be eliminated by using 
good algorithms and a good implementation. 

Most examples of this second approach use 
straightforward, unsophisticated algorithms [4]. The 
algorithm presented in this paper employs a sophisticated 
but simple algorithm which is based on an algorithm for 
the string-to-string correction problem. 


2. The String-to-String Correction Problem 

In [9] the siring- to- siring correction problem along with 
an 0(n 2 ) dynamic programming solution is presented. 
This problem is concerned with determining the edit 
distance between two strings, which is defined as the 
shortest sequence of edit operations needed to transform 
one string to another. An edit operation is the insertion, 
deletion, or alteration of an element of a string. The 
intended applications of the solution were in automatic 
spelling correction, and in the solution of the longest 
common subsequence problem [5, 1]. Faster algorithms 
exist, but they are more complicated and their speed 
advantages are only realized for large problem sizes [6, 7). 

Wagner and Fischer 1 define a trace to be a description 
of how an edit sequence S transforms string A into string 
B, ignoring order and redundancy in S. For example: 


'The text of much of this section borrows heavily from the presentation 
in [91. They said it well and I doubt that 1 could improve on it 


• 1981 ACM O-89791-O43-5/81/O6OO/0I23 $00.75 


123 


A: ’a' 'b' 


's’ ’t’ 





'c' 



Figure 2-1: An example of a trace 


A line in this diagram joining element i of A to element j 
of B means that B^. is derived from A ; , either directly if 
A ; =By or indirectly otherwise. Certain elements of A are 
untouched by lines; these elements are deleted by the 
transformation. Certain elements of B are untouched by 
lines; these elements are inserted by the transformation. It 
is important to note that no two lines ever cross. 

Formally, they define a trace from A to B as a triple 
(T.A.B) where A and B are strings and T is a set of ordered 
pairs satisfying: 

1. 1 < i< |A| and 1 <j < |B| 

2. for any two distinct pairs and (y 2 ) in T: 

a il* and f * j ’ 2 
b. h < *2 iff A < ii 

Ordered pairs in the trace correspond to lines in the 
diagram. (iy)6T => A ; gets transformed to B^. 

Three cost functions are used: 

C^ApB^) is the cost of transforming A ; to By 
C,<B^ is the cost of inserting By 
C^Aj.) is the cost of deleting A r 

Let T be a trace from A to B. Let I and J be the sets of 
positions in A and B respectively not touched by any line 
in T. The total cost of applying T is then defined as: 



That is, the total cost is just the sum of the costs for all the 
transformations, deletions, and insertions. 

Now return to the diagrammatic representation of a 
trace T from A to B. Let A=A’A” and B=B’B”, and 
suppose no line of T connects an element of A' to B” or 
A” to B'; that is the two strings A and B can each be split 
into two strings without having a line of T cross the split 



Figure 2-2: Splitting a trace 


Corresponding to this split in A and B, T can be split into 
two traces, T’ and T”, in the obvious way. Furthermore, 
C(T) = C(T’) + C(T"), so if T is a least cost trace from A 
to B, then so is T’ a least cost trace from A’ to B’, and so is 
T" a least cost trace from A” to B”. 

Every trace T from A to B can be split into two traces 
T and T" as above such that the lengths of A” and B” are 
each at most one but they are not both zero. This is the 
key idea for the following theorem, upon which the edit 
distance algorithm is based. 


Notation: Let A and B be strings. Define A (/) = a 
string composed of the first i elements of A, define B(/) 
similarly. Define A[z] = the rth element of A, and B[/j 
similarly. Define Dy. = the minimum cost of a trace from 
A(;) to B(/). For convenience in handling boundary 
conditions: define D^ = oo, D. ; . = co, and Vfc< 0, 
A[A], B[£] = some unique string element that occurs in 
neither A nor B and that can be transformed at no cost to 
itself. 

Theorem 1: 

D 0,O = 0 

V i,j, i > 0 ory> 0, 0 < / < |A|, 0 < j < |B|: 

D </“. ( 
min ( 

D hj-i + C,(A[/1,B[/]), 

C^M>. 

D + C,(B[/]) 

) 

Proof: The first part of the proof 
(D 0 0 = 0) is trivial, it is simply the cost of 
transforming the null string to the null string. 

Let T be a least cost trace from A(i) to 
B (/). If A[(] and B[/] are both touched by lines 
in T, they must both be touched by the same 
line, since otherwise these lines in T would 
cross. Then at least one of the following three 
cases must hold: 

1. A[z] and B[/] are joined by a line of T 
(i.e. (v) e T). Then the cost of T is 

m ; = D t-ij-t + c /A[z],B[y]) 

corresponding to the cost of 
transforming A (/-/) to B (J-l) plus the 
cost of changing A[i] to B[/]. 

2. A[z] is not touched by any line in T (and 
B[/] may or may not be touched by a line 
in T). Then the cost of T is 

m 2 = D„ ;j + C/A[z]) 

corresponding to the cost of 
transforming A (/-/) to B(/) plus the cost 
of deleting A[/]. 

3. B[/] is not touched by any line in 
T. Then the cost of T is 

= D y-/ + C / B M) 

corresponding to the cost of 
transforming A (;) to B (J-l) plus the cost 
of inserting B[/]. 

Since one of the three cases above must hold 
and Dy is to be a minimum: 

Dy. = min(m y , m 2 , m 3 ) 

This theorem leads directly to the following 
implementation of the minimum cost computation: 


124 


■J) 

[/] 

m 


ry 

o, 

in 

o 


g 


Algorithm 2: 

0 - 1 ,., « 0 ; 

for /' := 0 to | A | loop 
D ,-J S- 00; 
end loop ; 

for j : « 0 to | B | loop 
D,, := oo; 
end loop ; 

for / : = 0 to | A | loop 

for j : = 0 to | B | loop 
D. ■ : = 

‘ J min (D, ;>/ + C,(A[/].BD']). 
o i-U + C d (A[i]). 

0 u, + C,(BQ/])); 

end loop ; 
end loop ; 

mlncost = D| A | . | B | : 


Qearly this algorithm requires 0(|A||B[) time and 
space. This time bound has been proved optimal in [10] 
for the restricted case of the longest common subsequence 
problem where only equal/unequal comparisons are 
allowed between elements of strings. By clever 
bookkeeping, the algorithm can be altered to only require 
0(min(|A|,|B|)) space [5]. 

Only the cost of the minimum cost trace is returned by 
this algorithm. To recover the trace itself, we have to 
traverse matrix D from D, A , m. back to the beginning, D™. 
To aid in this traversal we define two functions: W. . tells 
us which of the three operations (insertion, deleuon or 
transformation) led to the optimal solution at i,j\ it simply 
tells us which of the three operands of min in the 
calculation of D ( . . led to the minumum. P. . is an ordered 
pair which givesihe subscripts of the subproblem in D of 
which D ; . is an extension. 


transformation ifD ;j . = D ; .^ + C ; (A[;],B[/D 
del etion if D^. = D hlJ + Cj(A[i]) 

Insertion ifD^. = D ^ + C^BI/D) 


P t/ = 

(i-lj-D 

(i-lj) 

(ij-D 


if W (j . = transformation 
ifW y = deletion 
ifW^ = Insertion 


3. An Example 

Figure 3-1 shows the contents of matrix D after the 
execution of algorithm 2 on the data of figure 2-1 using 
the following cost functions: 

Cf,A r Bj) = 0 iff A ; =B^ and oo otherwise 

c/b; = l 
C/A,) = 0 

The two dashed lines represent the two minimum cost 
traces. These cost functions cause the algorithm to find 
the longest common subsequences of "abcbst" and 
"bctsbt"; namely "bcbt” and "best". 


\b 

a\ 

-1 

0 

1 

2 

3 

4 

5 

6 



'b 1 

’c’ 

Y 

’s’ 

'b’ 

’t’ 

-1 

a. 

~ 

~ 

~ 

~ 

~ 

~ 

~ 










0 


\0 

- 1 

- 2 

- 3 

- 4 

- 5 

- 6 

1 ’a’ 

- 

— ir~ 

o\ 

— r~ 

- 1 

1 

- 2 

1 

- 3 

1 ' 

- 4 

- 5 

- 6 



— 

\ 




\ 


2 V 


0 

0, 

- 1 

- 2 

- 3 

- 4 

- 5 

3 ’c’ 


1 

1 

V.. 






0 

0 

•0 ' 

1 

■^-2. 

3 

4 



~ 1 

1 

i. 

1 

1 

\ 


4 'b' 


0 

0 

0 

-.1 

- 2 

\2 

- 3 



““ 1 

1 

1 

1 

V. 

k 


5 ’s' 


0 

0 

0 

- 1 

1 - 


- 3 



~ 1 


1 

\ 

1 

1 

V 

6 V 


0 

0 

0 

0 

- 1 

- 2 

2. 


Figure 3-1: Sample cost matrix 


4. The Redisplay Algorithm 

This redisplay algorithm takes as input the description 
of two images. One describes the desired appearance of 
the display after the redisplay is complete. The other 
describes the appearance of the display before the 
redisplay is invoked. The algorithm depends on the 
display having the following properties: 


An ambiguity exists if two of the operands of min in the 
calculation of D-. are equal and minimum. If this occurs it 
means that there are multiple optimal solutions, and one 
can be chosen arbitrarily. 

The following algorithm prints out the trace in reverse 
order: 

Algorithm 3 : 
i := | A | ; 
j := 1 8 1 ; 

while ;>0 or j> 0 loop 

print(W. ., " at f, j); 

(U) :* P y : 

end loop ; 


1. The ability to rewrite in place a character on a given 
line in a given position. 

2. The ability to delete a character from a given line 
and position. All following characters on that line 
are moved left one position, with a blank character 
entering at the right margin. 

3. The ability to insert a character on a given line at a 
given position. The character originally at that 
position and all following characters are moved right 
one position. Characters that go past the right 
margin disappear. 

4. The ability to delete a given line on the screen. All 
following lines on the screen are moved up one line, 
with a blank line entering at the bottom of the 
screen. 


125 




5. The ability to insert a line before a given line on the • 

A: 



screen. The line originally given, and all following 
lines are moved down one line. Lines that go below 

w 

V 

the bottom of the screen disappear. 

B: 

’b’ ’c' T 

's' 'b' 


't' 


These capabilities exist in many commercially available 
video terminals. For example. The Concept-100, Heathkit 
H19, lnfoton-400, and DEC VT-100. This algorithm was 
motivated by the desire to efficiently and effectivly exploit 
these common capabilities. 

Consider the subproblem of transforming one line of a 
display given two strings that describe the appearance of 
the display both before and after the transformation. For 
now, we will use the simple cost functions of the 
preceeding example. 

To do this transformation and maximize the number 
of characters that are not redrawn, we simply run 
algorithm 2 with these cost functions on the two strings. 
Then the trace gives us the characters that are to be 
preserved and how they map from the old to the new 
image. By the property of traces that no two lines cross, it 
is possible to do this operation using the allowed 
primitives: namely the simple insertion and deletion of 
characters, with the attendant left and right sliding of the 
rest of the line. If character i is to be moved to position j, 
then if i=j, then nothing need be done. If Kj then move 
to position i and insert j-i characters. Otherwise, if Oj, 
move to position j and delete i-j characters. 

This is, of course, overly simplified. One has to 
compensate for the fact that doing the transformation for 
one pair in the trace affects all pairs to the right of it. One 
must also compensate for the usual property of these 
displays that characters that move off the right edge of the 
screen are lost (lines that move off the bottom of the 
screen are also lost). If you do an insertion, then a 
deletion, the rightmost character of the line will be turned 
into a blank. This can be handled by doing all deletions 
first: simply traverse the trace twice. On the first 
traversal, if two adjacent pairs are found that require two 
characters to be moved closer together, then do a deletion. 
On the second traversal, if two adjacent pairs are found 
that require two characters to be moved farther apart, then 
do an insertion. 

After these insertions and deletions have been done, 
all that remains is to redraw those characters that are not 
preserved by the trace. 


0 

’b’ ’c’ 

i’ ’o’ 


’b’ 


x ., :b ' 0 . 

X. 

X "T.. 

T 's’ X T 


Deletion 

Insertion 

Redraw 


Figure 4 - 1 : Execution of the simple redisplay 


Figure 4-1 is an example of the insertions, deletions 
and writes that need to be done in order to perform the 
transformation indicated by the example in figure 2-1 
To formalize this, look at D- . and consider how the 
transformation represented by it is achieved given the 
ability to achieve the three neighbouring transformations. 



Figure 4-2: A single cell of D 


To achieve our target transformation, the conversion 
of the first i elements of A to the first j elements of B, 
there are three cases, which correspond to to the three 
cases in the proof of theorem 1. 

1. The optimal transformation can be derived from D. 
; ; by first transforming the first ;-l elements of A 
to the first y-1 elements of B, then doing the 
transformation of the element A to B. It is 
important to note that element A ; will be inihe same 
string position in the intermediate string as B. in the 
target string after the i'-lj-l transformation htis been 
done because of the way that insert/delete 
operations behave: namely that all following 
elements get moved. 


126 


J 


n 

S, 


t 


s 

e 


n 

s 


2. The optimal transformation can be derived from D 
ij by first deleting A. and then transforming the first 
4-1 elements of A to ‘the first j elements of B. In the 
other two cases we solve the subproblem first, then 
extend that solution to a solution of the full 
problem. Here we reverse the order (delete then 
solve the subproblem) because we want to avoid 
intermediate strings longer than B. If the deletion 
was done first, then after the solution of the 
subproblem we could have an intermediate string of 
length > |B|, which because of the constraints of the 
display would be truncated to the first |B| elements. 

3. The optimal transformation can be derived from 
q. . . by first transforming the first 4 elements of A to 
the first j - 1 elements of B, then doing an insert 
operation at position j and the transformation of the 
null element that resulted from the insertion to B. 

This leads to the following algorithm: 

Algorithm 4 : 

procedure Redisplay (4 j) is 
begin 

if (4/) = (0,0) then 
return ; 
end if; 
case W . . in 

when transformation => 

Redisplay ( i - 1 , j- 1 ) ; 

TransformElement (j, A ( ., By); 
when deletion »> 

DeleteElement (0: 

Redisplay (4-1, j); 
when Insertion => 

Redisplay (4, y'-l) : 

InsertElement (/) ; 

TransformElement (/, null. By) ; 
end case ; 
end 

Referring back to the set of display properties given at 
the beginning of this section, a strong symmetry between 
line and character operations is apparent. A line can be 
viewed as a siring of characters, and a screen as a string of 
lines. Algorithm 2 can be employed in three ways in a 
complete redisplay algorithm: 

1. To update individual lines, as has already been 
described. 

2. To move lines, rather than characters, using the line 
insertion/deletion operations; minimizing the 
amount of work done. 

3. As a cost function to determine the similarity 
between two lines. 

This suggests the following algorithm to do the total 
redisplay: 


Algorithm 5: 

call Algor1thm2 ( 

A = string of lines from old image, 

B = string of lines front new image, 

C rf (A,.) = cost of deleting a line at 4 

C ( (B ) = cost of inserting a line at J* C t (nuII,Sj) 

CTA^.B ) = call Algorithm2 ( 

A = A,.. 

B = Ay. 

C., C, C, = those for the one-line case. 

all 

{ this invocation merely returns a cost, 
and does not touch the screen. } 

) 

): 

call Algor1thm4 ( 

A = string of lines from the old image, 

B = string of lines from the new image, 

The procedures InsertElement and DeleleElement 
do line oriented operations, 
TransformElement(p,oldnew) is 

begin 

call A1gorithm2 ( 

A = old, 

B = new, 

C., C ., C, = those for the one-line case. 

a i l 

) • 

call Algor1thm4 ( 

A = old, 

B = new, 

The procedures InsertElement, DeleteElement, 
and TransformElement do character 
oriented operations on line p 

): 

end; 

): 

If / is the length of a line, and s is the number of lines 
on the screen, then this algorithm takes 0(s 2 / 2 ) time. 
While this algorithm does an excellent job of minimising^ 
the number of characters transmitted, its runtime is' 
unacceptable, even given a clever implementation [2], So 
some compromises have to be made. In computer science, 
compromises are usually called "heuristics". 


5. Performance heuristics 

Most of the time used by algorithm 5 is consumed by 
the inner invocation of algorithm 2 as the cost function. 
Other, cheaper, cost functions can be used, but optimality 
is lost. Since C ( . and C d already take constant time, no 
improvements can be made there. Large improvements 
can be made by speeding up the evaluation of C ( , the cost 
of transforming one line to another. 


127 



One possibility for C r is: 

C/A^B.) = 0 iff By and |BJ otherwise. 

which still takes 0(0 worst case time since A ( . and By are 
strings of length l. This can be speeded up at the cost of 
some accuracy by preprocessing the two arrays of strings; 
computing a hash value for each string. Then two strings 
can be compared in constant time. 

C/A ; ,By) = 0 iff A^ = By and |ByJ otherwise. 

Doing a string comparison only takes 0(1) time in the 
worst case. The expected time for a comparison is actually 
a constant which depends on the size of the alphabet, 
assuming that when strings are compared the comparison 
stops when two differing characters are encountered. For 
an alphabet size of 2 and a uniform distribution of 
characters the expected number of comparisons is 2, for 
larger alphabet sizes it is less. It the process of doing a 
string comparison a count of the number of matching 
characters may be kept, resulting in a measure of the two 
lines similarity, rather than just a match/nomatch 
indication. However, the overhead may still make the 
hash comparison method preferable. 

The choice of C t can be influenced by the intended 
application. For example, the project that originally 
motivated the research described by this paper was a 
structure editor; ie. an editor which manipulates a tree 
representation of the program, and the tree representation 
is reflected back on the screen with the appearance of a 
conventional program. A common operation is to embed 
a series of statements within a begin-end pair. The motion 
that one would like to see on the screen is for two lines to 
be inserted, in which the strings "begin" and "end" are 
written, and for all intervening lines to be moved right on 
the screen. This was done by using a hash of the contents 
of each line that ignored leading and trailing blanks. It 
has been satisfactory. 

This gives us an Ofr 2 ) time line permutation phase, 
preceeded by 0 (sf) time for preprocessing. This is 
followed by an O (si 1 ) time phase to update individual 
lines. 

A cheaper method of doing the intra-line update is 
needed that doesn't compromise effectiveness too much. 
Many methods are possible, but the following has proved 
effective: Most non-total changes to a line affect only one 
small subpart. For example, inserting or deleting a 
character, or changing an identifier. The old and new 
lines can each be broken into three subparts: a leading 
match, a trailing match, and a differing string in the 
middle. The leading match is the longest common prefix 
of the two strings, the training match is the longest 
common suffix of the two strings, and the differing strings 
are just the regions in the two strings between the leading 
and trailing matches. Once this partitioning has been 
done, all that has to be done is to move the trailing match 
sequence in the original line with insert/delete character 
operations so that it lines up with the corresponding string 
in the new line, and redraw the central difference. 
Allowance must be made for the costs of performing the 
various operations, but this is simply a long and tedious 
case analysis. 


New: abcxx.def 

leading central \ trailing 

match difference \ match 

Old: abcy y y^d e 

1 I 

abcxxyde 
a b c x x d e f 



redraw 

delete 


Figure 5-1: Intra-Line update 


So, our final redisplay algorithm is: 

Algorithm 6 : 

call Algor1thm2 ( 

A = string of lines from old image , 

B * string of lines from new image , 

W) = cost of deleting a line at / 

C^B .) * cost of inserting a line at y + C ( (nu//,B .) 
c'(a'.,B ; ) = 0 Iff A ; h =By and |B y | otherwise. 

call AIgor1thm4 ( 

A = string of lines from the old image, 

B = string of lines from the new image, 

The procedures InsertElement and DeleteEIement 
do line oriented operations, 
TransformElement(p,old,new) uses the 
technique just outlined. 

): 


The values of C ; and C d are affected by display screens 
having a fixed length: when you do an insertion, the last 
line on the screen is deleted, and when you do a deletion, 
a blank line is inserted after the last line. Effectively for 
each insertion you get a free deletion, and for each 
deletion you get a free insertion at the bottom of the 
screen. TTiis can be handled by setting C ; (B[/]) = 0 when 
evaluating D ■ and Cy(A[/]) = 0 when evaluating D^. 
The corresponding insertion and deletion operations can 
be omitted when doing the redisplay. 


6. Conclusion 

The redisplay algorithm described in this paper is used 
in an Emacs-iike editor for Unix and a structure editor. 
It’s performance has been quite good: to redraw 
everything on the screen (when everything has changed) 
takes about 0.12 seconds CPU time on a VAX 11/780 
running Unix. Using the standard file typing program, 
about 0.025 seconds of CPU time are needed to type one 
screenful of text. Emacs averages about 0.004 CPU 
seconds per keystroke (with one call on the redisplay per 
keystroke). 

Although in the interests of efficency we have stripped 
down algorithm 5 to algorithm 6 the result is still an 
algorithm which has a firm theoretical basis and which is 
superior to the usual ad-hoc approach. 


128 


I 


7. Acknowledgements 

The people who did the real work behind this paper 
are Mike Kazar, Charles Lieserson and Craig Everhart; all 
from CMU. 


1W 


e 


,<n 

irwise. 


it 


- . iSt 

tion, 
i for 
each 
f the 
vhen 

D u- 
> can 


Bibliography 

1. Kevin Q. Brown. Dynamic Programming in Computer 
Science. CMU, February, 1979. 

2. Craig Everhart -. Personal communication 

3. James Gosling. Fred: a screen editor for Unix. CMU 
CSD, 1979. Unpublished manual. 

4. B. S. Greenberg. The Multics Emacs Redisplay 
Algorithm. Honeywell Inc., 1979. 

5. D. S. Hirschberg. "A linear space algorithm for 
computing maximal common subsequences." CACM 18 
(1975), 341-343. 

6. J. W. Hunt and T. G. Szymansky. "A Fast Algorithm 
for Computing Longest Common Subsequences." CACM 
20 ( 1977), 350-353. 

7. W. J. Masek and M. S. Paterson. A Faster Algorithm 
for Computing String Edit Distances. Tech. Rept 105, 
MIT, May, 1978. 

8. Richard M. Stallman. EMACS manual for TWENEX 
users. MIT AI Lab, 1980. 

9. H. M. Wagner and M. J. Fischer. "The string-to-string 
correction problem." JACM 21 , 1 (January 1974), 168- 
173. 

10. C. K. Wong and A. K. Chandra. "Bounds for the 
String Editing Problem." JACM 23 (1976), 13-16. 


used 
litor. 
draw 
iged) 
/780 
;ram, 
; one 
CPU 

I per 

pped 

II an 
ich is 


129 



Design of the PEN Video Editor Display Module 

David R. Barach 
David H. Taenzer 
Robert E. Wells 

Bolt Beranek and Newman Ino. 


Abstract 

PEN, a new portable video editor, uses a 
number of simple but effective techniques. Most 
are not new, but are unavailable in the literature. 
We will describe our goals for PEN'S display 
module, discuss implementation alternatives and 
describe in detail the techniques used in the edi- 
tor. 


Introduntl on 

We began to work on the PEN editor as the 
result- of our attempts to find an editor which met 
several criteria: The editor should be portable, 
with use on the entire PDP-1 1/VAX family of comput- 
ers, and several of the popular operating systems 
(UNIX, RT-11, and RSX-11M) as short-term goals. 
The editor should run acceptably in the limited 
memory environment of an unmapped LSI-11, yet be 
able to take advantage of the address space avail- 
able on the VAX. The editor should be of high 
quality, in terms of internal cleanliness as well 
as in terms of external specification. Finally, 
the editor should be easy to change and expand as 
we saw fit. 

We spent a considerable period of time think- 
ing about how to adapt an available editor, such as 
NED [2,6]. We came to the conclusion that some com- 
bination of non-portable code, or unacceptable ex- 
pansion potential ruled out such an effort. So, we 
decided to try our hands at the design and imple- 
mentation of a portable editor. (In fact, PEN is an 
acronym for Portable Editing Nucleus...) 

Management allowed us to spend a great deal of 
time designing, implementing and re-implementing 
parts of PEN; we feel that for its size, PEN is one 
of the cleanest C-programs we have seen. This pa- 
per is an attempt to share some of the techniques 
we decided to use. 


Permission to copy without fee all or part of this material is granted 
provided that the copies are not made or distributed for direct 
commercial advantage, the ACM copyright notice and the title of the 
publication and its date appear, and notice is given that copying is by 
permission of the Association for Computing Machinery. To copy 
otherwise, or to republish, requires a fee and/or specific permission. 

' 1981 ACM 0-89791-043-5/81/0600/0130 S00.75 


Many of them are not new; however, they are not 
widely available in the literature. Several are in 
fact extremely simple, as well as surprisingly ef- 
fective. We hope that the discussion which follows 
will provide a useful basis for the future develop- 
ment not only of video editors, but also of other 
kinds of programs which present information on a 
video screen. 

This paper will examine in detail PEN'S 
display module, which we find particularly in- 
teresting. We will discuss the display module's 
structural relationship to PEN as a whole, describe 
our goals for it, discuss some implementation 
choices we considered, and present the algorithms 
we chose. 

PEN has three independent modules: the command 
interpreter, the display handler, and the data base 
handler. The command interpreter parses user keys- 
trokes and invokes routines that may modify inter- 
nal data structures. The display handler updates 
the terminal screen so that it is an accurate 
representation of the internal data structure. The 
data base handler converts the internal data struc- 
tures into external data structures (files). The 
following diagram illustrates PEN'S structure: 


! command j <===== (keyboard) 

! interpreter | 


/ \ 

/ \ 

/ \ 


! display ] | data base j <===> (file 

I handler i | handler | system) 


I terminal | =====> (screen) 
! handler | 


These modules interact almost completely via 
the internal data structures. Routines that 
correspond to user commands do not have to worry 
about the display, and so on. This leads to a very 
simple top level control structure. A submodule of 
the display handler provides the terminal indepen- 
dence. 


130 


Interruptabilitv 


lot 

in 

af- 

aws 

>P- 

ier 

a 


i 's 
.n- 

: 1 3 

be 

on 

ms 


nd 

se 


te 

he 


La 

at 

T 

"f 

if 


Goals of the Display Module 


Interruptabilitv 

We had five major goals for PEN'S display 
module. First, we wanted the display to be inter- 
ruptable in order to minimize unwanted updating of 
the screen. The display module should stop updat- 
ing the screen when the user types new characters, 
such as in two successive scroll commands. The 
display module must keep enough screen state infor- 
mation to allow updating to be cleanly restarted. 

Bandwidth Minimization 

Our second major goal for the display module 
was to minimize the number of characters sent to 
the terminal to update the screen. This is impor- 
tant to minimize the I/O bandwidth necessary for 
acceptable performance. 

Terminal Independence 

We also wanted to make PEN run on a variety of 
terminals, using as many terminal specific features 
as possible. A subgoal was to allow PEN to run on 
new terminals without changing the program. This 
is very important for widely distributed programs. 


As a general principle we wanted to minimize 
control and data paths between the display module 
and the rest of the program. This not only makes 
it easier to write command handling routines, but 
also means that these routines should not have to 
be modified if the display module changes, and vice 
versa. So, too, it reduces the amount of wizard 
level knowledge about the display required to add 
new commands. 


Finally, we wanted the editor to support mul- 
tiple windows on the screen in a reasonably com- 
fortable fashion. By windows we mean a section of 
the screen used to display the contents of an indi- 
vidual file. A multiple window display lets the 
user work with several files simultaneously. This 
is particularly useful when the contents of one 
file provide a context for editing another file. 
For example, one can see the error messages pro- 
duced by a compiler while searching for their 
causes. 


Implementation Alternatives 

In order to reach these goals, we felt that 
that is was imperative that we consider them as a 
group and not as separate problems. In other edi- 
tors where problems such as interruptability have 
been attacked as afterthoughts, the results have 
been difficult to extend, to modify and to under- 
stand. The basic problem is that the implementa- 
tion of these interrelated display goals requires a 
consistent global design. 


As mentioned above, display interruptability 
requires that the display module maintain informa- 
tion about the state of the screen. Information 
can be kept on at least two levels of granularity. 
If the display module has no information about the 
screen contents, it must repaint the screen com- 
pletely after the display i3 restarted. On the 
other hand, if the display module remembers every 
character on the screen, it can both restart from 
any position and use character and line modifica- 
tion strategies to optimally update the screen. 

The first technique is simple and requires no 
storage, but does not perform particularly well. 
For example, the NED [1,6] editor interrupts the 
display only for commands that repaint the entire 
screen. The second technique, which EMACS [3,8,9] 
use3, can produce optimal results but requires a 
large data base and a complex set of algorithms. 

A compromise between these two extremes is to 
keep enough information to determine if a particu- 
lar line is still anywhere on the screen. With 
that information, one can use line insertion and 
deletion strategies to move lines around on the 
screen. Only lines which have been invalidated 
must be repainted when the display restarts. One 
can keep the required Information either in the 
form of a hash code for each line, or of a unique 
line identifier that is changed whenever the line 
is modified. 

The goal of any editor's display process is to 
make the visible representation of a file 
correspond to its internal form. In a non- 
interruptable display, the screen and the internal 
form 3hould differ for only one reason: some 
command/action function has changed the internal 
form. In a fully interruptable situation, the 
display module will not always have time to finish 
updating the screen. PEN'S display module, for ex- 
ample, neither knows nor cares which of these case3 
obtains when it is invoked. We think it important 
that the display module not care; the display 
should be restartable from it3 normal entry point 
in either case. 

Bandwidth Minimization 

Our second goal, the minimization of I/O 
bandwidth required for adequate performance, is 
central to the display module of any successful ed- 
itor. An editor should make the screen correct as 
quickly as possible, while trying to avoid 
transmitting more characters to the terminal than 
absolutely necessary. Ironically, certain terminals 
make these goals conflict; we consider elapsed time 
more important than length of output. 

Editor designers must consider three interre- 
lated factors when deciding how much output minimi- 
zation to include. They must consider how much 
space the optimization data base will consume, how 
long the algorithms involved will take to work, and 
their complexity. An extremely complicated algo- 
rithm with a few bugs is probably worse than none 
at all! 


131 


Interestingly enough, near optimal screen 
repainting can confuse the user, since large logi- 
cal changes sometimes result in little (or worse 
no) screen changes. This isn't an argument for 
suboptimal performance, rather, it is a comment 
about human expectations. 

Terminal Independence 

Our third goal, that of terminal independence, 
is particularly important in the design of a suc- 
cessful editor. By terminal independence we mean 
that there should be one version of the editor 
which will run on almost any terminal. This allows 
several users to share the editor's instruction 
space (which is an important efficiency considera- 
tion on minicomputers). Also, modifications do not 
need to be repeated on several "slightly different" 
versions of the program. 

We know of three general ways to provide ter- 
minal independence. One can write a terminal 
specific function in the editor for each kind of 
terminal. The availability of dynamic loading 
makes this attractive. Such procedural solutions 
are extremely flexible and make it easy to handle 
problems such as speed dependent padding. 

One can also use a separate process which acts 
as a translator between terminal and the editor. 
This process acts as a simulator for a convenient 
virtual terminal, which can be smarter than any ac- 
tual terminal. Under UNIX, the interprocess com- 
munication can be done via "pipes" which make the 
translation process invisible to the editor. New 
terminals can be supported without changing the ed- 
itor program by providing new virtual terminal 
translator processes. If the virtual terminal is 
designed correctly, the editor can be simplified by 
putting most of the terminal handling complexity in 
the translator process. Also, the virtual terminal 
can take advantage of any local terminal intelli- 
gence, no matter how obscure. Unfortunately, most 
operating systems make this approach too ineffi- 
cient to use. 

A third approach to terminal independence is 
to have the editor read in an initialization file 
which defines the terminal's properties and deter- 
mines how the editor should interact with it. Un- 
like the two previous procedural techniques, this 
approach is data driven and therefore less flexi- 
ble. Since no two terminals are alike, the editor 
designer must take great care to identify a set of 
terminal description parameters which span the 
space of available terminals. 

For example, each kind of terminal uses a dif- 
ferent scheme for moving the cursor to a given 
(x,y) location on the screen. Some terminals want 
the x coordinate before the y coordinate, and other 
terminals require the opposite. Some send these 
values as ASCII characters while others require the 
value to be added to a constant, such as 40 octal. 
Each terminal has a different sequence that invokes 
cursor motion. Some require separators between the 
x and y coordinates, and some need terminating se- 
quences. Some terminals think of the upper left 
hand corner of the screen as being at coordinate 
(0,0) while others think of it as (1,1). Further- 
more some terminals require padding characters to 


be sent after the cursor motion command. The 
amount of padding can depend not only on the line 
speed, but also on the distance the cursor moves. 
These variations are more difficult to handle in a 
data driven scheme than in a procedural approach. 

The editor should be able to take advantage of 
as many local terminal features as possible. Some 
editors use only features common to all terminals, 
such as cursor motions and screen erasure. We 
think that it is better to take advantage of spe- 
cial features when possible. For example, most but 
not all terminals have an erase to end of line 
feature, which can greatly speed up screen updating 
[5]. 


As we mentioned above, we think that it is im- 
portant to separate the display module as much as 
possible from the rest of the program. This makes 
it much easier to write simple command-level func- 
tions. There are several approaches to this prob- 
lem, which effect the design of the data structures 
used elsewhere in the program. 


If the display is totally independent of the 
rest of the program, it must maintain all of the 
information necessary to update the screen. The 
database section of the program can then use what- 
ever data structures are convenient. The display 
converts this to a screen representation, deter- 
mines how this screen representation differs from 
its predecessor and sends the appropriate updates 
to the screen. This scheme has the advantage that 
the command level and file database sections of the 
program do not have to concern themselves about the 
display. Unfortunately, this model leads to large 
data space requirements and extra code for convert- 
ing between the database and the screen representa- 
tions. 


At the other end of the spectrum, one can 
design a display module that is dependent on the 
command handlers for information on which parts of 
the screen should be updated. Each command has to 
know whether the text it is modifying is on the 
screen, an unnecessary restriction. The NED editor 
works this way. This was a major source of problems 
in enhancing NED. Just as the display should be 
thought of as a goal oriented process which at- 
tempts to make the screen properly represent the 
contents of an internal data structure, the command 
handlers should be thought of as small independent 
agents which modify the internal data structure 
without reference to an external screen representa- 


A third approach is to design the internal 
data structures so they contain information to help 
the display update the screen. This usually con- 
sists of remembering which sections of the internal 
data representation have been modified by user lev- 
el commands. The easiest way to make this scheme 
work is to design consistent internal and external 
data representations. The major disadvantage of 
this scheme is that the two representations are no 
longer independent and changes to one may effect 
the other. If you want to replace the display 
module with another version, the new module must 
work with a similar data representation. 


132 


53 . 

i a 


of 

me 

3. 

We 

ut 

ne 

ng 


al- 

as 

as 

li- 

as 


le 

ie 

ie 

ay 

am 

as 

it 

le 

le 

re 


an 

ie 

if 

-o 

ie 

ir 

is 

e 


e 

.d 

it 


1 

P 

1 

e 

1 

f 

o 

t 

Y 


Multiple Windows 

The last major goal of the display module is 
to represent multiple windows on the screen. Mul- 
tiple windows should be an integral part of the 
design of the display module, since "band-aid" 
solutions to this problem usually lead to long 
nights trying to find "the last display bug". The 
standard trick problem concerns the mapping of a 
single file into more than one window on the 
screen. Also, the display designer must be careful 
when handling vertical (side by side) windows. 

It is possible to put window information into 
the internal data structures associated with files; 
we will call these internal structures "buffers". 
This leads to problems when buffers are mapped into 
more than one position on the screen. Also, it is 
often convenient to have buffers which are not 
displayed. This leads to inefficiencies when the 
display module has to determine which buffer is as- 
sociated with a particular position on the screen. 
A more efficient approach to this problem is to 
have separate window structures which are associat- 
ed with buffers. 


The window structure contains all the informa- 
tion about a section of the screen which is used to 
display a buffer: 

. window flags 
. pointer to buffer 

. cursor line and column number in window coor- 
dinates 

. buffer coordinates of upper left hand corner 
of window 

. screen coordinates of upper left hand corner 
of window 

. height and width of window 
. length of each window line at last display 
. buffer coordinates of first column of each 
window line at last display 
. vector of unique line id's to do scrolling op- 
timization 

. vectors describing columns on each line that 
must be repainted 
. pointer to mark for window 

. window coordinates of mark when last displayed 

The mark structure, which is used to remember 
and relocate a position in a buffer, contains: 


m . 


With all of these issues in mind, we will now 
describe how PEN actually works. You will notice 
that in some cases we chose sub-optimal solutions 
to some of the display goals, generally as a result 
of address space constraints. The display module 
uses four data structures in PEN: the buffer . 
line , window , and mark . For each structure we will 
give a brief functional description and an overview 
of its contents. 


Internal Data Structures 

The buffer structure is the internal represen- 
tation of a file. There are also "internal" 
buffers which are used to save deleted text, move 
text from one buffer to another, etc. Buffers con- 
tain: 


. buffer name - the file name or internal buffer 
name 

. buffer status flags 
. file protection modes 

. temporary file names associated with the 
buffer 

. linked list of line structures 
. number of top line in line list 
. number of lines in line list 
. pointer to current line in line list and its 
line number 

. linked list of mark structures associated with 
the buffer 

The line structure, the internal representa- 
tion of a line in a file, contains: 


. mark flags 

. buffer coordinates of mark 
. optional display representation of mark 


Control Structure 

The control structure of the display module is 
very simple. For each dirty window, PEN loops over 
all lines and determines whether they are already 
correctly displayed. If not, and the editor de- 
cides that the line is elsewhere on the screen, PEN 
uses local scrolling (if available) to move the 
line to its correct position. If the line cannot 
be fixed with scrolling, PEN uses information in 
the window structure to update the portion of the 
line that has changed. Once all lines in the win- 
dow have been correctly displayed, PEN redisplays 
the mark if necessary. 

The display module takes a flag to determine 
if it should display only special priority windows. 
In its normal mode (with this flag clear), PEN 
checks before the display of each line to see if 
the user has typed any characters. If there is 
type-ahead, the display module simply returns, and 
finishes updating the screen the next time it is 
called. The current window is displayed first, 
which means that user input is echoed before other 
windows are updated. 

If the flag is set, the display module only 
displays windows that have a special priority flag 
set in the window structure. These windows are up- 
dated regardless of user type-ahead. We use such 
windows for putting up messages and warnings to the 
user. 


. character array which holds the data 
. number of characters allocated 
. number of characters used 

. unique line number and updated bit (see below) 


133 



Scrolling Algorithm 

PEN has a small set of routines which manipu- 
late buffers, through which the rest of the editor 
obtains access to them. These routines update a 
unique line id value which is associated with each 
line structure in the buffer's line list. The win- 
dow structure contains an array of these line ids 
which correspond to the lines that have been updat- 
ed on the screen by the display module. Thus, the 
decision to update the screen representation of a 
line is based on a simple comparison of these line 
id values. 

When a command level function makes changes 
the buffer line list, at least one line id value in 
the window structure will not match the current id 
value in the buffer line list, indicating that at 
least one line needs to be updated on the screen. 
One could also hash code the line's contents, but 
this requires more computation and can lead to col- 
lisions. 

The unique line id scheme also lends itself to 
optimizing the display of line insertions and dele- 
tions by the use of terminal scrolling commands. 
The display algorithm compares the window line id 
array to the line id values in the buffer line 
list. If the line ids match, the line is correct 
on the screen. If they differ, there are three 
cases: The line may have changed, perhaps a word 
has been deleted. Or, the screen may contain one 
or more line3 that have been deleted. In this case 
the line id from the buffer line list will match a 
line farther down in the window line id list. Fi- 
nally, a line (or lines) may have been inserted be- 
fore the current line on the screen. In this case 
the window line id will appear later in the buffer 
line list. 

This makes the scrolling algorithm reasonably 
simple. For each window, the display module 
searches the window line id list trying to find the 
id value in the buffer line list. If the line id 
from the window fails to match the value in the 
buffer line list, PEN determines which of the above 
mentioned cases obtains. In the two cases where 
scrolling helps, PEN issues a terminal specific 
command to scroll the window, and updates the win- 
dow line id list. PEN then scrolls the window line 
id array in the same way it has just scrolled the 
screen, namely from the current line to the bottom 
of the window. Note that this problem is similar 
to that of comparing two files and printing a line 
by line list of differences. 

PEN 'S Coordinate Spaces 

This algorithm is quite simple because there 
is a one to one correspondence between lines in the 
window and lines in the buffer. Lines that are too 
long to fit in the window are not wrapped to the 
next line (ala EMACS). Instead, PEN draws special 
symbols at either end of each screen line to indi- 
cate that there is more data to the right or left 
of the window. In addition, tabs in files are 
represented internally with special space charac- 
ters we call "gray space." PEN converts sequences 
of gray space characters back to tabs during file 
saving. So, too, the internal representation of a 
control character consists of two special charac- 


ters. They are displayed as a special meta charac- 
ter followed by the corresponding upper case ASCII 
mapping of the character. 

These decisions, which mean that a character 
in the internal file representation always maps to 
exactly one character on the screen, lead to an ex- 
tremely nice property. We have mentioned that PEN 
uses three coordinate spaces, namely buffer, window 
and screen. It turns out that each coordinate 
space is trivially related to the others - they are 
all simple offsets of each other. In other words, 
to get the buffer line number of the third line in 
a window, one adds three to the buffer coordinates 
of the upper left hand corner of the window. To 
get the screen coordinates of this line, one adds 
three to the screen position of the upper left hand 
corner of the window. As mentioned above, PEN 
maintains the buffer and screen coordinates of the 
upper left hand corner of the window in the window 
structure. 

This three coordinate space model greatly sim- 
plifies the command handling routines, since it is 
simple to convert a screen position to a buffer po- 
sition. It also makes it easy to manipulate shaped 
regions of text, such as rectangles. This is very 
hard to do with a data representation which doesn't 
promise that a single character in the buffer will 
always map into a single character on the screen. 
Most editors do not support shaped regions of text. 

Margins and Marks 

PEN draws left and bottom margins to separate 
windows. There are flags in the window structure 
which enable the display of either or both margins 
and to indicate whether the bottom margin must be 
redrawn. There are also flags for each line which 
determine if the left margin for that line must be 
redrawn. 

PEN differs from many other editors in that it 
can display a mark in each window. The window 
structure contains a pointer to this mark, along 
with the window coordinates at which the mark was 
last shown. After all the lines in the window have 
been updated, PEN checks to see if the displayed 
mark is still valid. If necessary it erases the 
old mark by restoring the character at that posi- 
tion, and displays the mark at its new position. 
Parts of the display module which change the screen 
update this saved mark position if necessary. For 
example, the scroller must notice if it has moved 
the mark, and the line display section must notice 
if it has overwritten it. 

Terminal Independence 

PEN reads an initialization file which con- 
trols the input and output translations. PEN uses 
a single table to perform output translation. We 
assign a code to each of the output sequences PEN 
uses, for example the "cursor-up" sequence, which 
moves the cursor to the previous line on the 
screen. 


134 


PEN will use the following terminal features, 
if available: 

. carriage return 

, move cursor home, up, down, right, left 
. move cursor to (x,y) position 
. clear screen 
. clear to end of line 
. insert/delete line 
. insert/delete n lines 

. various other terminal specific scrolling al- 
gorithms 

. character insert mode 
. special graphics characters 

PEN deals with direct cursor addressing via 
three sequences and a terminal dependent numeric 
parameter, which is also in the initialization 
file. We defined eight "standard" cursor motion 
terminal types. These eight types result from the 
enumeration of three orthogonal properties in cur- 
sor motion protocols. The properties: which coor- 
dinate (x or y) should be sent first, whether the 
coordinates are to be sent as a string of ASCII 
characters or as a binary value, and finally if the 
upper left hand corner of the screen should be 
treated as (0,0) or as (1,1). Given this informa- 
tion, PEN sends out the first sequence, followed by 
the first coordinate, the second sequence, the 
second coordinate and finally the third sequence. 

Most terminals use cursor motion sequences 
which fit into this scheme. Our experience is that 
we stand a good change of being able to support a 
new kind of terminal without changing the editor. 
We have also made provisions for terminals that do 
not fit into this mold, such as the Ann Arbor, by 
defining special terminal type codes which invoke 
terminal-specific functions. Fortunately, this 
does not seem to happen often. 

Terminal scrolling protocols are generally in- 
compatible with each other and therefore do not 
lend themselves to the simple table driven approach 
we used for direct cursor addressing. The display 
module supports two general scrolling schemes, us- 
ing insertion and deletion of both single and mul- 
tiple lines on the screen. All other terminals 
that have special scrolling functions, such as the 
VT100 and Concept 100, require special routines in 
the editor. PEN reads a numeric parameter from the 
initialization file to determine which type of lo- 
cal scrolling, if any, the terminal supports. Ter- 
minals should ideally support the scrolling in a 
single command of an arbitrary rectangular region 
by an arbitrary amount, either vertically or hor- 
izontally, but very few do provide such support. 
Terminal scrolling support is essential to provid- 
ing reasonable responsiveness at very low line 
speeds. 

PEN performs input translation via two tables. 
The input process converts arbitrary sequences of 
input characters into internal function codes. The 
first table, which PEN uses for single character 
translation, consists of a 128 byte character ar- 
ray. Each cell contains a function code or a a 
special code which indicates that the character is 
the first character of a multi-character input se- 
quence. PEN keeps these sequences in the second 
table, which is sorted; we use binary search on it 


when looking for a sequence in it. We wrote a small 
initialization file compiler, which takes an easily 
edited text input file, and produces an initializa- 
tion file, which is in binary form. The compiler 
will reject any ambiguous sequences. A sequence is 
ambiguous if it is an initial subsequence or an ex- 
tension of another sequence. 

PEN remembers the screen position of the cur- 
sor at all times and uses local cursor motions to 
reposition the cursor when they are optimal. A 
numeric parameter in the initialization file tells 
the editor how the cursor acts when it hits the 
right edge of the screen. If this is not speci- 
fied, the editor uses a direct cursor addressing 
command after writing a character in the last 
column of the screen. Or. some terminals, such as 
the Tektronix 1)025, local cursor optimization 
greatly improves performance. 


Conclusions 

As of this writing, PEN is almost eighteen 
months old. We believe that we have succeeded in 
writing a high quality editor for a minicomputer 
environment. Looking over the techniques we used, 
we feel most are simple, or certainly no more dif- 
ficult than alternate approaches which don't work 
as well. 

The things we find most interesting about the 
PEN display module are its goal oriented approach 
to the problem of interruptability , its indepen- 
dence from the rest of the editor, and its scrol- 
ling optimization algorithm. We think that the one 
to one mapping between the database and the screen 
representations has been an important factor in 
PEN '3 simplicity and efficiency. PEN also demon- 
strates that terminal independence and multiple 
window support need not be as difficult as they ap- 
pear. Finally, we hope our work will provide a 
good starting point for new editor and video pro- 
gram development. 


135 



Bibliography and References 



1. Anderson, Owen T, "The Design of an Editor- 
Writing System," S. B. Thesis, Dept, of Phy- 
sios, Massachusetts Institute of Technology, 
Cambridge Mass., February 1979. 

2. Bilofsky, Walter. "The CRT Text Editor NED - 
Introduction and Reference Manual," R-2176- 
ARPA, Rand Corporation, Santa Monica, Ca., De- 
cember 1977. 

3. Cicoarelli, Eugene. "An Introduction to the 
EMACS Editor," A. I. Memo No. 447, Artificial 
Intelligence Laboratory, Massachusetts Insti- 
tute of Technology, Cambridge Mass., January, 

1978. 

4. Davis, Charles Roger. "A Software Foundation 
for an Office Based Document Preparation Sys- 
tem," S. B. Thesis, Dept. of Electrical En- 
gineering and Computer Science, Massachusetts 
Institute of Technology, Cambridge Mass., May, 

1979. 

5. Greenberg, Bernard S. "Terminal Features 
Memo," Multlcs Technical Bulletin No. 419, 
Honeywell Inc., Cambridge Mass., July 16, 1979. 

6. Irons, E. T. , and Djorup, F. M. "A CRT Editing 
System," Communications of the ACM, January 
1972. 

7. Joy, William. "Ex Reference Manual," Computer 
Science Division, Dept, of Electrical Engineer- 
ing and Computer Science, University of Cali- 
fornia, Berkeley, April 1979. 

8. Stallman, Richard M. "EMACS The Extensible, 
Customizable, Self-Documenting Display Editor", 
A. I. Memo No. 519, Artificial Intelligence La- 
boratory, Massachusetts Institute of Technolo- 
gy, Cambridge Mass., June 22, 1979. 

9. Weinreb, Daniel L. "A Real-Time Display- 

Oriented Editor for the Lisp Machine," S. B. 
Thesis, Dept, of Electrical Engineering and 
Computer Science, Massachusetts Institute of 
Technology, Cambridge Mass., January, 1979. 


136 



The Implementation of Etude, 

An Integrated and Interactive Document Production System 


Michael Hammer, Richard llson. Tim Anderson, F.dward Gilbert, 

Michael Good, Bahrain Niamir, Larry Roscnstcin, and Sandor Schoichet 


Laboratory for Computer Science 
Massachusetts Institute of Technology 
Cambridge, Massachusetts 02139 


1. Introduction 

Elude is an experimental text processing system that is 
being developed in order to formulate and evaluate new 
approaches to the design of user interfaces for office 
automation tools. The primary design goal for Etude is to 
provide the user wilh substantial functionality in the editing 
and formatting of documents in the context of a system that 
is easy to learn and use. 

Office workers can now' have access to the technology that 
will allow diem to produce documents of typeset quality, 
due to die dramatic increase in quality and decrease in cost 
of ever more powerful and flexible output devices, such as 
laser and electrostatic printers. However, the control of 
such equipment as provided by conventional formatting 
systems introduces substantial complexity into the docu- 
ment production process and effectively requires that the 
operator of the system be an amateur typographer. The 
commands the operator must employ to specify the 
appearance of a document are low-level and very detailed. 
Moreover, the operator must engage in a lengthy process 
consisting of initial specifications, printing on the output 
device, revision of specifications, additional printing, and so 
on, until the output has the desired appearance. 

Etude seeks to address these problems by providing an 
environment in which the document is displayed to the 
operator as he is working on it, and in which the 
appearance of this document is specified in high-level terms 
based on the natural structure of the document. Elude is 
designed to operate on a high-resolution screen which can 
present to the operator a representation of a document as it 

Permission to copy without fee all or part of this material 
is granted provided that the copies are not made or distrib- 
uted for direct commercial advantage, the ACM copyright 
notice and the title of the publication and its date appear, 
and notice is given that copying is by permission of the 
Association for Computing Machinery. To copy otherwise, 
or to republish, requires a fee and/or specific permission. 

c 1981 ACM 0-89791-043-5/81/0600/0137 SOO.75 


would be produced on a typeset quality output device. 
Formatting information is provided by specifying the type 
of the document ( letter , report , memo, or the like), and by 
identifying the components of the document that are 
associated with its particular type (such as return address, 
salutation, body, and closing for a leltei). Etude has drawn 
from the Scribe system in this regard [8, 9], The Etude 
system has die responsibility of translating these descriptors 
into detailed formatting instructions for the screen and the 
printing device. 

Etude also seeks to support and enhance the entire process 
by which an operator interacts with a system to produce a 
document, encompassing bodt Lhe editing and formatting 
functions. The design of the user interface plays the 
dominant role in achieving this end. Etude seeks to lower 
the anxiety factor typically associated with word processing 
and other computer-based document production systems 
by providing a command structure that is accessible and 
comprehensible to a novice user, together with a variety of 
user aids. These include on-line menu and help facilities, as 
well as the ability to "undo” any completed activity; the 
latter makes the entire document production process less 
susceptible to user error. Moreover. Etude seeks to avoid 
the conflict between a system that is easy for a novice to 
learn and one that is convenient for an experienced user. 
This is accomplished by providing the user with a diversity 
of interaction modes, ranging from succinct command 
codes through specialized function keys to detailed menus 
and prompts. The user has the choice of which mode to 
employ, and can readily shift from one to another. All 
share a consistent underlying framework. 

The Etude user interface is summarized in Section 2 below; 
further details may be found in Hammer et al. [1] and the 
complete specifications [2]. While Etude does resemble a 
number of other systems in some of its individual con- 
structs (notably Scribe and Bravo [4]), it also provides a 
number of innovative features, and is unique in its 
integration of a wide range of facilities and in its approach 
to Lhe total document production process. A prototype 


137 


implementation of the Etude system has been completed, 
and a second version is now being developed. In 
attempting to implement the range of Etude facilities with a 
modicum of efficiency (although performance was not our 
principal design goal for the prototype), a number of major 
issues relating to the construction of text processing systems 
were confronted. The purpose of this paper is to identify 
these issues and describe the approaches taken to them. 
Section 3 summarizes the overall architecture of the Etude 
prototype implementation, and Section 4 describes the way 
in which Etude represents documents: this representation is 
fundamental to all that follows. Sections 5 and 6 discuss the 
ways in which this representation is used to support editing 
and formatting, and document display on the screen. 
Section 7 summarizes the approach taken to handling the 
user interface, especially the facilities that support the user. 
Section 8 describes how the Etude system interacts with its 
terminal device. 

2. Overview of Etude 

Etude is designed to operate on a hardware base that 
provides a high-resolution bit-map display device. (Etude 
may also be used with conventional CRT screens, but only 
a page-size bit-map display will utke advantage of all of 
Etude's capabilities.) The Etude operator sees the display 
divided into a number of windows. The center portion of 
the screen is the text window, in which a full page of 
formatted text is displayed. This page may include text in a 
variety of type styles and sizes, may be right-justified, may 
exhibit proportional spacing, and may be organized in a 
variety of page layout formats, as determined by the 
document's component specifications. The display is 
intended to represent the appearance of the document as it 
would be printed by a typeset quality output device; it is 
constantly maintained to reflect the current status of the 
document as it is changed in the course of the editing 
process. The left-margin of the page serves as the formal 
window, in which the components of the document’s 
structure are indicated. (Etude hits several user-selectable 
pagination and display modes, which determine whether 
page-breaks are recomputed dynamically or by explicit 
request, whether components are displayed or suppressed 
in the format window, and so on.) 

Part of the screen is reserved for interaction and status 
windows: the former displays user commands as they are 
entered and system responses, as well as error information; 
the latter shows a variety of contextual information. Help 
information and menus are displayed on request in special 
windows that “pop up” on the screen. These windows are 
placed so as not to obscure the area of current interest to 
the operator, which in general is defined to be the area 
around the position of the cursor. 


Etude commands are structured like English imperatives; 
they consist of a verb phrase and up to two noun phrases. 

A noun phrase consists of a series of modifiers and an 
object. Elude provides a number of basic objects 
(character, word, sentence, line, paragraph, column, page, 
and document) that are widely used in many different 
contexts, as well as a set of document-specific components. 
There are four basic modifiers (next, previous, start-of, and 
end-of); any positive integer may also serve as a modifier. 
Modifiers and objects may be combined as in English; e.g. 
start-of paragraph, end-of next 3 sentence(s). previous 10 
line(s). 

Etude verbs may be divided into five categories as 
determined by their function: 

- User aids (undo. help, menu, go ahead, cancel, again) 

- Cursor movement (go-to, — >, j., «— ) 

- Region definition (begin, end) 

- Editing (erase, copy, move, label, back-space, back- 
word) 

- Formatting (make, remove, change, split, combine) 

The editing verbs are similar to those encountered in 
conventional text-editing systems. The editing verbs (and 
some others) are applied to a region of text, which may be 
identified by a noun phase or by means of explicit region 
definition (see below). (The prototype system implements 
only a basic set of editing verbs; a more complete set will be 
provided in the next version.) 

The formatting verbs are used to manipulate the compo- 
nent structure of a document. Make is used to associate a 
component with already existing text while change and 
remove alter or remove a component associated with some 
text. Split and combine provide additional ways to 
manipulate the component structure. 

To move the cursor, the operator may use either the cursor 
control keys (T, — >, J., and <— ), or the go-to command 
followed by a noun phrase. If the noun phrase starts with a 
non-numeric modifier, the go-to command may be omitted. 

Region definition is not done as an independent operation 
in Etude, but rather in the context of a command such as 
move, erase, or label. A region is a sequence of text, and 
may be defined either by using a single noun phrase, or by 
using begin and end to bracket a series of explicit cursor 
movements. 

The user aids are critical elements in the plan to make 
Etude truly easy-to-use. As implemented, undo will reverse 
the effects of the last operation pcrfoimed. All operations 
except labeling, anchoring, and region definition are 
undoable in the prototype version. Although the prototype 
only allows the most recent operation to be undone, the 
system architecture supports a completely general reversal 
facility, and a richer user interface would allow an arbitrary 


138 



number of operations to be undone. Help is available at 
any lime in the session, and provides information about 
what the user has been doing and what his options are at 
the current point. Cancel terminates the specification of an 
operation, while again repeats the last command. (This 
latter facility is generally available only for the cursor 
movement and editing commands.) 

We plan to design a keyboard specifically for the Etude 
system. The prototype uses a conventional ASCII 
keyboard: the Elude verbs and objects can be directly 
keyed via control and escape sequences. Any verb or object 
name can also be explicitly typed in full, or selected from a 
menu displayed to the operator. 

Go ahead is used for the confirmation of operations such as 
erase; the area affected by the command is highlighted, and 
execution occurs only when the user hits go ahead. 
Moreover, when typing in a name during an Etude 
command (usually that of a component), typing go ahead 
instructs the system to automatically “complete ' the name, 
from what has been typed. If the siring typed in so far is 
ambiguous (i.e., a prefix of more than one name). Elude 
informs the user of the ambiguity. The menu command 
may then be used to display available options (in the form 
of legitimate names that have the typed string as prefix). 
Whenever menu is pressed, whelher in this situation or in 
others where a menu can be used, the available options are 
displayed in a “pop-up” window. Tire cursor control keys 
can then be used to select front the menu— the currently 
selected item is highlighted on the screen, and each cursor 
control key moves the selection over by one item in the 
menu. Pressing go ahead finalizes selection from the menu. 
Alternatively, the user may decide to type in the name of an 
item listed in the menu and then press go ahead. 

3. System Architecture 

An initial implementation of Etude, which focused on 
exploring some of the more challenging implementation 
issues, was constructed in the CLU programming 
language [6]. This system currently runs on a 
DECSYSTEM-20 and drives a Nu terminal [12], which has 
a bit-map display. The Nu terminal runs a virtual terminal 
interface (VTI) program, which provides operations to 
display multiple type faces and manipulate rectangular 
screen areas. In this section, we present a brief description 
of the software architecture of the entire Etude system. 

The implementation of the Etude system is divided into 
two parts: the user interface and the editor/ formatter / 
display. The user interface is responsible for parsing the 
keystrokes entered by the user, interpreting them, and then 
invoking the appropriate internal operations to realize the 
desired function. Most of the lime a function of the editor, 
which is responsible for making changes to the document. 


is invoked. If the user's command does not involve 
changing the document, the user interface handles the 
function directly; this is the true for help, menu, and cancel 
functions. After the appropriate internal operations have 
been performed, the user interface updates the session state, 
which is a record of what actions have been performed, and 
is mainly used for the purpose of implementing help and 
undo. Th e formatter. which reformats the regions in the 
document that have been changed by the just completed 
command (and that appear on the screen), is then called. 
Finally, the display system is invoked, which updates the 
screen image to reflect the current state of the document. 

For example, if the user types a text character, such as “i," 
when he is not in the midst of issuing a command, the 
character is to be inserted into the document. The user 
interface will instruct the editor to insert the character “i” 
into the document at the position marked by the cursor and 
will update the session state to indicate that the character 
was so inserted. The user interface invokes the formatter, 
which reformats at least the line into which the “i” was 
inserted, and possibly more (if the line “overflowed” as a 
result of the insertion). The user interface then asks the 
display system to update the screen. The display system 
redisplays at least the line containing the new “i”; again, 
more lines may be redisplayed if the insertion of the “i 
caused changes to other lines in the document. After all 
these operations are completed, the user interface waits for 
more keystrokes from the user. 

All changes to the document follow this same basic pattern. 
A more complicated command, such as erase 3 lines, 
requires additional work from the user interface to parse 
the command, to invoke more general operations of the 
editor, and to record the operation in the session state. The 
operations of the formatter and display system remain 
essentially the same, although larger regions of text may 
need to be reformatted and redisplayed. 

The user interface handles a help request by examining the 
session stale and constructing a temporary document 
containing the text of the help information. It allocates an 
area on the screen for the text of the help information, then 
calls the formatter and display system on this temporary 
document, which results in tire appearance of the help 
information on the screen. 

4. The Etude Document Representation 

In order to provide “real-time" and high-level formatting, 
the Etude system makes use of a rich representation of the 
document being processed. 

Etude deals with three aspects of a document: 

- the content, the sequence of characters that form its 
text; 


139 


- the internal structure, the organization and classifi- 
cation of the ideas and information contained in the 
document; 

- die outward appearance, the arrangement of text that 
determines the way die document appears, either 
when printed or displayed. 

Each of these aspects is represented in Etude's document 
representation. 

The content of a document is represented by a simple linear 
sequence of characters, stored in a list structure called the 
text chain. 

Both the internal structure and outward appearance of a 
document are modeled by hierarchies. These are distinct 
hierarchies lhaL cannot be directly related. For example, 
the body of a letter (part of its inlern;d structure) might be 
completely contained widiin a single page (outward 
appearance); or it might occupy parts of two pages; or it 
might extend over several pages, completely containing one 
or more of these pages. 

The representations of both the internal structure and 
outward appearance of a document, described in the 
following two sections, are woven over the content of the 
document. In order to maintain the relationship between 
these structures and the content of the document, markers 
may be inserted into the text chain. These markers serve to 
relate the internal structure and outward appearance of the 
document to each other by delimiting die segments of the 
text chain associated with components of each structure. 

4.1. Representation of Internal Structure 

4.1.1. Subdocuments 

The total content of a document is not actually stored as a 
single text chain, but is broken up into a number of disjoint 
pieces called subdocuments. Subdocuments are document 
components that have no sequential or containment rela- 
tionships with one another (although, as will be discussed 
later, they may have spatial relationships). The content of 
each subdocument is stored as a single text chain. The 
major subdocument is typically the main text; in some 
documents (such as a simple business letter), it is the only 
subdocument. More complex documents might have 
several subdocuments: running heads and feet, footnotes, 
and figure captions would all be represented as separate 
subdocuments — as wouid the two versions of the main text 
in a dual-language book. 

4.1.2. Components 

The internal structure of each subdocument is modeled by 
a hierarchical tree structure. The objects in the tree 
structure are called components. A component has the 
following information associated with it: 


- Each component has a single owner component and an 
array of owned components (children), used to imple- 
ment the hierarchical tree structure. 

- Each component is of a specific class, which identifies 
the kind of internal structure component that it 
represents. Typical component classes are 

"return address,” "address,” "body," and 
“paragraph.” 

- Each component identifies a region of its subdoc- 
umem's text chain. This relationship is represented by 
the presence of a begin component marker and an end 
component marker in Lhe text chain, which are pointed 
to from the component. The characters contained in 
any component are thus accessible. The component 
structure is also accessible from the text chain because 
each begin and end component marker (begin or end) 
has a pointer to its associated component. 

4.2. Representation of External Structure 

The external structure, like die internal structure, is 
represented as a hierarchy built over the content of the 
document. Unlike the content or the internal stmeture, 
however, the outward appearance is not directly specified 
by the user. Instead, it is constructed automatically by the 
Etude system. 

4.2.1. Pages 

Just as the internal structure of the document is divided up 
into a number of subdocuments, the external structure is 
divided up into a sequence of pages. A page in our 
representation has the conventional meaning: the total 
arrangement of printed markings on one side of a sheet of 
paper (or on a display screen). 

4.2.2. Boxes 

The box is the fundamental unit out of which the outward 
appearance of a document is built. Our use of boxes is 
modeled after the TeX system [3], although we have 
modified and extended the concept somewhat. The 
outward appearance of each page is represented as a 
hierarchical structure of boxes in a fashion analogous to the 
internal structure of subdocuments. 

A box is a two-dimensional object with rectangular shape. 
All boxes have a reference point and three associated 
measurements: a height above the reference point, a depth 
below it, and a width. 

A character is represented by the simplest kind of box, 
having only these three basic attributes. Its reference point 
corresponds to the base line of the character. The base line 
of a character is an imaginary line at the top of the 
descender of a character with a descender (for example, “g” 
or “p”), or the bottom of a character if it has no descender 
(for example, “a” or “b”). 


140 


When boxes are joined together to form larger boxes, they 
are either joined horizontally or vertically. If they are 
joined horizontally, they are all aligned on their reference 
points and the resulting box is called a line. Thus, when 
constructing a line of characters, the base lines of the 
characters will be aligned properly without further effort. 

Ifboxes are joined vertically, they are also aligned on their 
reference points, and the resulting box is called a column. 
The typical column of text is a box constructed of lines. A 
line's reference point is normally at the left edge of the line, 
because the reference point of a line is the reference point 
of its first character. Thus, lines that are joined together to 
form a column are aligned on their left edges. 

A fourth kind of box, glue, is inserted into a line or column 
box when extra space is needed between its component 
boxes. There are a variety of different classesof glue. For 
example, the two classes of glue normally found in a line 
are inter-wonl glue and inter-sentence glue, and the glue 
normally found in a column is inter-line glue. 

Glue has three more attributes in addition to its class: a 
natural space, a stretch, and a shrink. A glue s class 
detemiines its particular values of natural space, stretch, 
and shrink. The natural space is the desired or “optimal” 
size of a particular blank space in a line or column. The 
stretch and shrink components determine how much the 
normal blank space may be expanded or contracted if it is 
necessary to increase or decrease the size of a blank space in 
order to, for example, justify a line of text. 

Lines and columns are both constructed boxes, that is, they 
are made up out of smaller boxes. Like components, 
constructed boxes have associated with them an owner box 
and a set of owned boxes which together define the 
hierarchical tree structure of each page’s outward appear- 
ance. 

Each line and column contains a region of a particular 
subdocumenfs text chain. This relationship is represented, 
as it is for components, by begin owap markers and end 
owap markers inserted into the text chain. (Owaps 
represent the OutWard APpearance, as components repre- 
sent the internal structure.) The outward appearance 
structure is accessible from the text chain because each 
owap marker has a pointer to its associated line or column. 

4.2.3. Layouts and Containers 

In order to make up the arrangement of columns that 
constitutes a page. Etude abandons the “boxes and glue 
approach adapted from TgX . Instead, it provides a third 
type of constructed box, the page box, which allows its 
component boxes to be located arbitrarily in two-dimen- 
sional space. 


This approach has been taken because there are problems, 
such as page makeup, for which horizontal and vertical lists 
of boxes (lines and columns) are not natural constructs. 

For example, a page in a complex document might have 
two or more columns of text, several cut-outs for illus- 
trations. and a running header. The page box allows these 
diverse components to be correctly and directly positioned 
on the page. Although a structure with the same 
appearance could be built out of a complex hierarchy of 
line and column boxes held together with glue, it would be 
cumbersome to do so. Such a structure would also be much 
more difficult to manipulate if it were altered, say by the 
addition or deletion of an illustration. 

Even though page boxes were devised mainly for address- 
ing the problem of page layout, they tire just one more type 
of constructed box, and so can be included anywhere in the 
hierarchical box structure. 

The arrangement of the components of a page box is 
represented by a layout that locates a set of containers with 
respect to the upper left comer of the page box. (Typically, 
this would be the comer of a sheet of paper, or of the 
screen.) Each container is associated with one of the 
columns of text that is to appear on the page. A container 
may define any rectilinear area, and is represented as the 
union of a set of rectangles. A container may be thought of 
as providing the size and shape constraints under which the 
text of its associated column is to be formatted. By setting 
the width and the horizontal position of each line appro- 
priately as the column is composed, the outline of the 
column's text can be made to conform to any shape the 
container may assume. These component columns may 
come from any of the subdocuments in a document. For 
example, one column may come from a “header” subdoc- 
ument and another from a “bodytext" subdocument. The 
page box will have a pair of begin and end owap markers in 
the text chain of each subdocument that has a portion of its 
text appearing on the page. 

In addition to the text containers that have already been 
implemented, image, line art, and table containers will 
eventually be made available in the Etude system. Each of 
these container types will be associated with objects that 
can have their own unique internal representations, de- 
signed to be natural for the manipulations that will be 
applied to them. All that is required for such a box to be 
incorporated into the layout of a page box is that it provide 
the standard small set of box attributes and operations. In 
this manner. Etude allows a variety of different data 
structures to be cleanly integrated into a single overall 
document representation. 


141 


5. Editing and Formatting 

5.1. Representing Changes to the Document 

In order to give die user immediate feedback on the display 
of any changes he makes to the content or internal structure 
of his document. Etude performs incremental formatting 
and incremental redisplay. Incremental formatting is the 
ability to reformat — i.e., reconstruct the outward appear- 
ance of— only those portions of the document that have 
been changed by an editing operation. Similarly, incre- 
mental redisplay is the ability to redisplay only those 
portions of the document that appear on the screen and 
that have been changed. 

As the user edits, the document maintains indications of the 
changes that have been made to it; the outward appearance 
hierarchy is used to keep track of these changes. When a 
change is made to the document, Lhe lines in the changed 
section are marked as changed, and Lhe columns containing 
those lines are also marked as changed. Thus under this 
scheme, the smallest unit of text that is reformatted and 
redisplayed is one line. Although this is not ideal— it might 
be desirable at times to redisplay only a single character— it 
is adequate for our purposes. 

Wc have only discussed in a vague sense what it means for 
a section of the document to be "changed." We have said 
that such a section, if it appears on the screen, needs to be 
reformatted and redisplayed. Just marking a section of the 
document as “changed” is not adequate to fully represent 
die dynamics of formatting and display. For example, a 
section of the document that has not been changed may still 
need to be reformatted. This may happen when a character 
is deleted from a line; the previous line has not been 
changed, but it still may need to be reformatted, because 
deleting a character in a line may allow a word at the 
beginning of that line to move up to the end of the previous 
line. 

Thus, there are actually two kinds of marking done on the 
document: unformatted marking and changed marking. 
Sections of the document that are potentially unformatted 
as a result of an editing operation are marked unformatted', 
this is an indication to the text formatter that it must 
examine that section and reformat it, if necessary. Sections 
of the document that have actually been altered are marked 
as changed', this is an indication to the redisplay subsystem 
that these sections need to be redisplayed on the screen. 

If a character is inserted or deleted from the text chain, then 
the line containing that character is marked as both changed 
and unformatted. If a component is inserted or deleted 
from the component hierarchy, then all the lines that have 
characters contained in the scope of the component are 
marked as changed and unformatted. 


This marking propagates upward through the outward 
appearance hierarchy. When a line is marked unformatted 
(changed), the column containing it is also marked 
unformatted (changed). 

The text formatter formats a section of the document by 
formatting all Lhe lines in that section that are marked 
unformatted. In doing Lhe formatting, it may unformat and 
change additional lines of the document, which will then be 
marked appropriately. When the formatter finishes format- 
ting all the lines in the section, it marks those lines as 
formatted. The redisplay system, in order to keep the 
screen up-to-date, would then redisplay all the lines that are 
marked as changed, and then mark them as unchanged. 

5.2. Formatter Architecture 

The structure of the Elude formatting subsystem is strat- 
ified to handle the reconstruction of each layer of the 
outward appearance hierarchy as independently as possible. 
Four basic modules implement the formatter: 

1. The displaywrighl manages the various display and 
pagination modes, and controls the invocation of the 
next two modules. (Just as a "shipwright” builds and 
repairs ships, the “displaywright" builds and repairs 
die displayed image of the document. Similarly, the 
pagewright. columnwrighl and linewright build and 
repair pages, columns and lines.) 

2. The pagewright is responsible for maintaining and 
rearranging Lhe layout of individual pages as required 
by the user’s actions and Lhe current pagination mode. 
The pagewright works exclusively with containers, that 
is, with the shape, size, and location of the various text 
areas on the page. 

3. The columnwrighl is invoked within Lhe context of a 
particular container and controls the invocation of the 
linewright in order to build up a single column of text 
that conforms to the size and shape constraints 
imposed by the container. The effect of the page- 
wright and Lhe columnwrighl operating together is to 
“flow” text through the appropriate areas of the page. 

4. The linewright is responsible for composing each 
individual line of text within the formatting envi- 
ronment defined by the component hierarchy and the 
line width constraints provided by the columnwrighL 

5.3. Format Specification 

The text formatter of the Etude system composes the 
outward appearance of a document. Etude’s formatter uses 
a data base of formats for determining how each compo- 
nent in the internal structure of a document should be 
formatted. (This is quite similar to the approach taken by 
the Scribe text formatter.) It derives the formatting 
information from both the data base, which contains a set 



1 


of pre-defined formats for each class of component, and the 
arrangement of components in the internal structure 
hierarchy. 

The data base contains a formal specification for each class 
of component known by Etude. The format specification 
includes a number of formal attributes, and a value 
specification for each attribute. For example, "type face" is 
an attribute that might have the value specification italic, 
and "right margin" is an attribute that might have the value 
specification "1.5 inches.” 

A component's format specification need only partially 
determine the formatting environment for the component. 
(The formatting environment is a total specification of all 
the typographic attributes and values for the component.) 
For example, the "center" component centers the text 
contained in it. We would want the text to be centered 
within the margins of the document, whatever they 
happened to be. The format specification associated with 
the "center" component would not specify die margins 
between which the text should be centered; rather, the 
margins would be derived from the margins defined for the 
document type. Thus, the desired margins for the centered 
text would be inherited from previous specifications. 

6. Display 

6.1. Goals of the Redisplay Mechanism 

Like the formatter, the redisplay mechanism operates 
incrementally— only those parts of the screen that have 
been changed tire redisplayed. Also, parts of the screen that 
have not been changed but are in line wrong position (e.g.. 
following the deletion of a line) are moved, through the use 
of a screen bit-map operation (referred to as a block move , 
and implemented by the Virtual Terminal Interface). The 
use of incremental redisplay not only reduces the amount 
of information that must be sent to the display, but also 
reduces the distraction caused by a large part of the display 
changing frequently. 

The redisplay mechanism also supports the screen layouts 
needed for Etude, which uses a number of system-defined 
windows on the screen (text, format, interaction, etc.). 
Etude can also display multi-column documents, and 
provide "pop-up” help and menu windows. 

6.2. Redisplay Approach 

The Etude redisplay mechanism is organized around the 
concept of a column picture, which represents the display 
image of a column of text. Column pictures are used to 
represent not only columns of a document, but also system- 
defined windows. Each column picture contains a pointer 
to the column of text that it contains and information about 
each line of llie column that is currently on the screen. 


Column pictures are organized on the screen using window 
objects. Each window has a rectangular shape, a position, 
and a contents. There are two types of windows: basic 
windows, which contain a column picture, and compound 
windows, which are composed of zero or more windows. 

The windows in the system, therefore, are organized into a 
tree structure, with basic windows as the leaves and a 
window that corresponds to the total physical screen as the 
root. 

Elude might display only a portion of a column picture. A 
basic window may be thought of as a rectangular hole in a 
large piece of paper. This piece of paper is put down on 
top of each column picture so lhai the part of interest is 
visible through the hole. This visible part can then be 
considered as another image, and can also be placed (along 
with other such images) in a compound window. 

The physical screen is updated by calling a special redisplay 
procedure. This procedure traces through the window 
structure, until it reaches Die basic windows. Then the 
column picture display routine displays each column 
picture contained in a basic window. This routine will 
update its part of the screen to correspond to the current 
stale of the text in the column picture. 

6.3. Redisplay Implementation 

Incremental redisplay, along with block moves, is done on a 
column picture basis by the column picture display routine. 
First, the display routine determines what part of the 
column picture has changed. This is done by examining the 
changed marks left in the document representation by the 
editing and formatting operations. The column picture 
display routine looks at these marks, and resets them when 
a line is displayed on the screen. 

There are two passes in redisplaying a column picture. On 
the first, the redisplay routine checks to see if a block move 
can be performed. The routine tries to find the largest 
number of consecutive lines that have not been changed 
but are in the wrong position on the screen. These lines are 
then directly moved into the correct position. 

On the second pass, the routine goes through all the lines 
again, and displays any line that is either changed or in the 
wrong position. A line is displayed by clearing out the part 
of the screen it occupies, and displaying it in full. Thus, any 
change to a line (even the addition of one character to the 
end) will cause that entire line to be re-displayed. 

The information about what lines are on the screen is stored 
in a table associated with each column picture. This table 
contains pointers to the line and to the part of the screen it 
occupies. This information is used on the first pass to 
determine which lines are out of position. After block 
moves are done, the table is updated to reflect the results of 
the moves. 


143 



7. The User Interface 


The Etude user interface is responsible for considerably 
more than simple command parsing. Given the need to 
support online help, menus of possibilities, and an undo 
operation, it has to know what the user is doing, what he 
has done recently, and what he can do next. 

7.1. Document Interface 

Although the user interface is constrained to manipulate 
Etude documents, their representation is too rich for that to 
happen directly. The interface therefore deals with two 
abstractions: cursors and regions. 

A cursor represents a location in a document. There can be 
many cursors in a document, but each subdocument has a 
single “main” cursor, which the user sees as his location in 
the subdocument he is editing. The principal operations on 
cursors are movement, copying, and text insertion and 
deletion', when the user types next 3 words, the user 
interface translates this into "move the main cursor over the 
next three words”; when he types x, the user interface 
translates this into “insert the character x at the main 
cursor.” 

A region is essentially a pair of cursors. To the user 
interface, all text objects, whether chapters or words, are 
seen only as regions: a cursor pointing to the beginning, 
and a cursor pointing to the end. Thus, erase next 3 words 
is translated into “acquire a region containing the next 
three words, and erase iL” 

7.2. The Screen 

Of the three windows normally displayed on Etude’s 
screen, the user interface is directly responsible for two: the 
status window and the interaction window. 

The status window displays, in addition to the time of day 
and system load, the current document type, the name of 
the current subdocument, and the logical location of the 
main cursor. The physical location of the cursor in the 
document is of course displayed directly on the screen, but 
a single physical location may correspond to many logical 
locations: the logically distinct positions of “the end of a 
paragraph" and “the end of the last sentence in a 
paragraph” are physically indistinguishable. At present, the 
logical location is shown as a list of all the components 
containing the cursor, ascending in the hierarchy; e.g., 

paragrnpli/subsection 2/section 8/article 

The interaction window is divided into two sections. The 
first is used to display error messages and responses to some 
commands; the second echoes commands as they are typed. 
All three sections of the user interface’s display are Etude 
documents, displayed through the normal redisplay system. 


7.3. Command Parsing 

Etude commands are entered as English imperatives: erase 
next 3 words, for example. Parsing of a command line is 
driven by the verb, which has associated with it a 
description of the number and type of arguments that it 
takes. Back-word takes no arguments, while move takes 
two: a region, and a location. The command parser begins 
by looking the verb up in one of several dictionaries; in 
Etude, some verb properties depend on the dictionary 
containing tire verb. The lookup returns the verb’s 
description (if, indeed, the verb is valid), and the parser 
then interprets each of the arguments in turn. 

A location is always parsed by a recursive call on the 
command parser, with a different set of dictionaries. This 
provides the user with the full set of cursor-movement 
commands (T, 1. go-to. etc.), in addition to help and undo, 
with which to gel to a location, but does not pemiit the 
intervening execution of other commands. The recursive 
call does not terminate until the user enters go ahead as a 
command, so a sequence movement commands may be 
entered. Thus, 

move word (to) start-of next chapter j | 

would move the current word to the third line of the next 
chapter. 

A region is parsed in one of two ways, depending on the 
first character typed in the region definition. The user may 
define a region simply as two cursors, by typing begin, an 
arbitrary sequence of movement commands (handled via a 
recursive call), and end; or he may enter a symbolic region 
definition, such as next 3 words. 

A simple grammar constrains the user to enter reasonable 
phrases, composed of some modifiers and a single noun: 
start-of next chapter is accepted, while next previous 
chapter is not. Similarly, one can specify 3 words, but not 3 
documents. The user can edit the definition by using the 
back-word and back-space keys; beyond that, he can only 
cancel the command. 

One of the possibilities for the noun in a region definition is 
a component name, such as chapter. A table of these is 
maintained by the formatter; if the user wishes to enter one, 
he simply starts to type the name. When he attempts to 
confirm, if enough of the name has been typed to uniquely 
specify it, he will succeed; otherwise, the name will be 
completed as far as possible (or truncated, if the user has 
gone astray), and he will be informed that the name is still 
ambiguous. 

The user may type menu at any time during this process. 
Etude will select the remaining possibilities from the list of 
component names, and put the cursor into a special window 
(overwriting part of the document display) in which they 


144 



are displayed. The arrow keys — ►.i.and <— ) are 

redefined to move among the possibilities, so the user can 
point to his choice. (In later versions, we hope to have 
some physical pointing device, such as a mouse, which 
could make this less cumbersome.) He may also continue 
typing the name, having seen the possibilities; this is in 
keeping with our philosophy of providing help without 
getting in the way of the experienced user. The menu 
display is a multi-column layout, provided by the standard 
formatting facilities. 

The command parser is strongly influenced by the pseudo- 
English parsers of computer-based games like Zork and 
Adventure [5], which have found wide acceptance among 
novice users. The present implementation is lacking in 
several respects: menus are only available in a few cases, 
and command-line editing is severely restricted. However, 
it does provide at least a sample of all the desired facilities, 
and implements an easy-lo-understand and easy-to-remem- 
ber command language. 

7.4. Command Execution 

The user interface must keep track of what the user has 
done and what he is doing in order to implement help and 
undo. As it processes a command, the parser builds a node 
tree, where each node represents a command or a command 
argument. The node for a move command, for example, 
has two children; one for the region definition, and one for 
the location moved to. It also contains a cursor pointing to 
the original location of the region and a copy of the region, 
so that it can undo the operation. The location node might 
have many children, one for each cursor movement 
command entered during the operation. The node tree 
thus provides a complete description and history of the 
user’s actions. 

Help is also supported by the node structure. Each node 
type has associated help information expressed in past, 
present, and future tenses. (Each form is required since 
help is dependent on the context in which it is invoked.) In 
the current version, help messages are quite curt They 
would correspond to the first level of detail in a general 
message, but successively more detailed information will be 
needed to support a query-in-depth facility. 

When help is typed, the tree is traced starting at top level. 
The user is informed of the past few operations he has done 
and of what he is currently doing. If the current node type 
is “top level”, then the standard list of available operations 
is retrieved. Otherwise, the node searches down the most 
recent branch of the tree to provide information about what 
has been done within the operation currently being 
performed, and what remains to be done. This information 
is assembled into a temporary document and presented to 
the user. 


While history information is useful for undo and help, 
carrying all this information around in memory is certainly 
not feasible. A compile-lime parameter determines how 
many operations are to be kept within memory: older 
operations are then deleted. In the next version, they will 
be written out to disk, so lliat they can be retrieved when 
necessary. 

8. The Virtual Terminal Interface 

The Etude prototype runs on a DECSYSTEM-20 and has 
access to the Nu terminal over a low speed (9600 baud) 
“terminal” line. This necessitates the use of coded 
communication to initiate tasks that manage the screen. In 
the prototype, the workstation and its associated driver 
program are viewed as an advanced terminal with the 
capability of displaying multiple character sets of different 
types and faces consisting of variable pilch characters. 
Because the low speed nature of the communication line 
prohibited the sharing of responsibility between Etude and 
the Nu, the Nu was primarily viewed as an output device. 
However, the Nu in its emulation of a terminal attempted 
to provide a number of high-level commands that would 
shield Elude from performing the detailed and tedious 
operations that would be required if Etude were driving a 
traditional terminal. 

The interface provided by the Nu terminal emulator 
provides two classes of commands for text applications. In 
addition to cursor movement and character oriented 
commands such as changing the mode in which a character 
is “painted" on the screen, the interface provides for 
commands geared towards screen management that take 
advantage of the unique features of the bit-mapped display. 
The work-horse of this class of commands is the array 
operation. Every other screen management function is 
either a special case of the array operation , or can be broken 
down into a sequence of such operations. 

An array operation is an operation on a rectangular array of 
bits in the display memory. Two arrays (the source and the 
destination arrays) are specified to the operation. The two 
arrays are operated upon bit by bit and the result is stored 
in the destination array. The operation may be one of 
sixteen possible boolean operations on two arguments. 

This array operation is similar to the RaslerOp function of 
Newman and Sproull [7] and the BitBlt instruction of the 
Alto Personal Computer [11]. It has the added feature that 
the source and destination rectangles need not be of the 
same size. If the source array is larger than the destination 
array in a dimension, it will be truncated in that dimension. 
If the source array is smaller in a dimension, it will be 
replicated in that dimension. Thus if the source is a half- 
toning pattern such as a 4 X 4 raster, the destination will be 
filled to that pattern (this is similar to the ADISRegionOp 
in [10]). 


145 


Most terminal operations can be decomposed into array 
operations: e.g.. clearing the screen or an area of the screen 
can be achieved by one operation. (If the area is not 
rectangular, it may be broken down into a sequence of 
rectangular operations.) Likewise, deleting n lines may be 
accomplished by two array operations. Writing characters 
is just another array operation where the source is in the 
fonts area. "Reverse video" consists of complementing the 
bits in the destination rectangle. An area on the screen or a 
group of characters on the screen may be highlighted by 
specifying the source to be a half-tone pattern. 

The array operations makes it unnecessary for Etude to 
retransmit characters as long as the relative position of 
characters within the group does not change drastically 
during the redisplay attempt. Hence redisplay throughput 
is increased. This is especially significant when the screen 
contains a large number of characters and the communi- 
cation line makes retransmission costly. The array operation 
may also be used as a primitive for other higher level screen 
management functions; for example, window display and 
maintenance can be easily translated into sequences of 
array operations. 

9. Summary 

Wc have presented an overview of the Etude document 
production system and its prototype implementation. The 
Etude systc-m strives to provide high functionality with low 
interface complexity by structuring the operator’s com- 
mand language and supporting the entire user interaction 
process. In order to provide these capabilities, a new 
approach to the implementation of a text processing system 
is needed. The key issue is one of document represen- 
tation. providing a structure that can both support the rich 
functionality and provide an acceptable level of perfor- 
mance. 

The Etude effort is continuing. A human factors evaluation 
of the interface has been designed and is about to 
commence. The system itself is being migrated to run 
entirely on a "personal” single-user machine, with no 
mainframe in the background. Etude is being used as the 
base of a complete integrated office workstation, which will 
provide such facilities as electronic mail, database manage- 
ment, graphics and image processing, and more. The 
architecture of the system is being revised and extended, 
but the implementation concepts presented here will 
continue to be used. 


References 

1. Hammer, Michael et al. Etude: An Integrated and 
Interactive Document Production System. Proceedings of 
the 1981 Office Automation Conference, AFIPS, 

March, 1981. 

2. llson. Richard and Michael Good. Etude: An Inter- 
active Editor and Formatter. Memo OAM-029, MIT Lab. 
for Computer Science, Office Automation Group, March, 
1981. 

3. Knuth, Donald E. TEX and METAFONT: New Direc- 
tions in Typesetting. American Mathematical Society and 
Digital Press, 1979. 

4. Lampson, Buder W. Bravo Manual. In Alto User’s 
Handbook, Xerox PARC, 1979. 

5. Lebling, P. David, Marc S. Blank, and Timothy 

A. Anderson. Zork: A Computerized Fantasy Simulation 
Game. Computer 12, 4 (April 1979), 51-59. 

6. Liskov, Barbara et al. CLU Reference Manual. Tech. 
Rep. 225, MIT Lab. for Computer Science, Oct., 1979. 

7. Newman, William M. and R. F. Sproull. Principles of 
Interactive Computer Graphics. McGraw-Hill, New York, 
1979. Second Edition. 

8. Reid, Brian K. A High-Level Approach to Computer 
Document Formatting. Conference Record of the Seventh 
Annual ACM Symposium on Principles of Programming 
Languages, ACM, Jan., 1980, pp. 24-31. 

9. Reid, Brian K. and Janet H. Walker. Scribe Introductory 
User's Manual. Third edition, Unilogic, Ltd., 605 
Devonshire St., Pittsburgh PA, 15213, 1980. 

10. Sproull, Robert F. Raster Graphics for Intractive 
Programming Enivironment. Tech. Rep. CSL-79-6, Xerox 
PARC, June, 1979. 

11. Thacker, C. P. et al. Alto: A Personal Computer. 

Tech. Rep. CSL-79-11, Xerox PARC, Aug., 1979. 

12. Ward, Stephen A. and Christopher J. Terman. An 
Approach to Personal Computing. Digest of Papers, 
Compcon’80, IEEE, Feb., 1980, pp. 460-465. 


146 


9 


EMACS 

The Extensible, Customizable 
Self-Documenting Display Editor 

Richard M. Stallman 
Artificial Intelligence Lab 
Massachusetts Institute of Technology 
Cambridge, MA 02139 


Abstract 

EMACS is a display editor which is implemented in an 
interpreted high level language. This allows users to extend the 
editor bv replacing parts of it, to experiment with alternative 
command languages, and to share extensions which are generally 
useful. The ease of extension has contributed to the growth of a 
large set or useful features. This paper describes the organization 
of the F.MAC'S system, emphasizing the way in which 
extensibility is achieved and used. 

This report describes work done at the Artificial Intelligence 
Laboratory of the Massachusetts Institute of Technology. Support for the 
laboratory's research is provided in part by the Advanced Research 
Projects Agency of the Department of Defense under Office of Naval 
Research contract N00014-80-C-0505. 


1 . Introduction 

EMACS 1 is a real-time display editor which can be extended 
by the user while it is running. 

Extensibility means that the user can add new editing 
commands or change old ones to fit his editing needs, while he is 
editing. EMACS is written in a modular fashion, composed of 
many separate and independent functions. The user extends 
EMACS by adding or replacing functions, writing their 
definitions in the sa"me language that was used to write the 
original EMACS system. We will explain below why this is the 
only method of extension which is practical to use: others are 
theoretically equally good but discourage use, or discourage 
nontrivial use. 

Extensibility makes EMACS more flexible than any other 
editor. Users are not limited by the decisions made by the 
EMACS implementors. What we decide is not worth while to 


1 EMACS stood for Editing Macros, before we realized that EMACS is 
composed of functions written in a programming language rather than 
macros in the editor TECO. 

Permission to copy without fee all or part 
of this material is granted provided that the 
copies are not made or distributed for direct 
commercial advantage, the ACM copyright 
notice and the title of the publication and 
its date appear, and notice is given that 
copying is by permission of the Association 
for Computing Machinery. To copy otherwise, 
or to republish, requires a fee and/or 
specific permission. 

s 1981 ACM 0-8979 1 -043-5/8 1 /0600/01 47 $00.75 


add. the user can provide for himself. He can just as easily 
provide his own alternative to a feature if he does not like the way 
it works in the standard system. 

A coherent set of new and redefined functions can be bound 
into a library' so that the user can load them together 
conveniently. Libraries enable users to publish and share their 
extensions, which then become effectively part of the basic 
system. By this route, many people can contribute to the 
development of the system, for Lhe most part without interfering 
with each other. This has led the EMACS system to become 
more powerful than any previous editor. 

User customization helps in another, subtler way, by making 
the whole user community into a breeding and testing ground for 
new ideas. Users think of small changes, try them, and give them 
to other users. If an idea becomes popular, it can be incorporated 
into the core system. When we poll users on suggested changes, 
they can respond on the basis of actual experience rather than 
thought experiments. 

To help the user make effective use of the copious supply of 
features, EMACS provides powerful and complete interactive 
sclf-documenlation facilities with which the user can find out 
what is available. 

A sign of the success of the EMACS design is that EMACS 
has been requested by over a hundred sites and imitated at least 
ten times. 


1 .1 . Background: Real-Time Display Editors 

By a display editor, we mean an editor in which the text being 
edited is normally visible on the screen and is updated 
automatically as the user types his commands. No explicit 
commands to "print" text are needed. 

As compared with printing terminal editors, display editor 
users have much less need for paper listings, and can compose 
code quickly on line without writing it on paper first. Display 
edilors are also easier to learn than printing terminal editors. This 
is because editing on a printing terminal requires a mental skill 
like that of blindfold chess: the user must keep a mental image of 
the text he is editing, which he cannot easily see, and calculate 
how each of his editing command "moves" changes it. A display 
editor makes this unnecessary by allowing the user to see the 
"board". 

Among display editors, a real-time editor is one which 
updates lhe display very frequently, usually after each one or two 
character command the user types. This is a matter of the input 
command language. Most priming terminal editors read a string 
of commands and process it all at once: a useful feature on a 
printing terminal. For example, there is usually an "insert" 
command which inserts a siring of characters. When such editors 
arc adapted to display terminals, they often update the display at 


147 



the end of a command string: thus, the insertion would be shown 
all at once when it was over. It is more helpful to display each 
inserted character in its position in the text as soon as it has been 
typed. 

A real-time display editor has (primarily!) short, simple 
commands which show their effects in the display as soon as they 
are typed. In EMACS. text (printing characters and formatting 
characters) is inserted just by typing it: there is no "insert" 
command. In other words, each printing character is a command 
to insert that character. The commands for modifying text are 
nonprinting characters, or begin with nonprinting characters. 
Many-character commands echo if typed slowly: if there is a 
sufficiently long pause, the command so far is echoed, and then 
the rest of the command is echoed as it is typed. Aside from this. 
HMACS acknowledges commands by displaying their effects. 

EMACS is not the first real-time display editor, but it derives 
much appeal from being one. It is not necessary to know how to 
program, or how to extend EMACS, to use it successfully. 


2. Applications of Extensibility 

To illustrate and demonstrate the flexibility which EMACS 
derives from extensibility, here is a summary of many of the 
features, available to EMACS users without the need to program, 
to which extensibility has contributed. Many of them were 
writlen by users: some were written by the author, but could just 
as well have been written by users. 


2.1 . Customization 

Many minor extensions can be done without any 
programming. These are called customizalions, and are very 
useful even by themselves. For example, for editing a program in 
which comments start with <** and end with **>, the user can tell 
the EMACS comment manipulation commands to recognize and 
insert those strings. Ibis is done by selling parameters which the 
comment commands refer to. It is not necessary to redefine the 
commands themselves. Another sort of customization is 
rearrangement of die command set. For example, some users 
prefer the four basic cursor motion commands (up, down, left and 
right) on keys in a diamond pattern on the keyboard. It is easy to 
reassign the commands to these positions. It is also possible to 
rearrange the entire command set according to a different 
philosophy. 


2.2. Operating on Meaningful Units of Text 

EMACS can be programmed to understand the syntax of the 
language being edited and provide operations particular to it. 
Many major modes are defined, one for each language which is 
understood. Each major mode has die ability to redefine any of 
the commands, and reset any parameters, so as to customize 
EMACS for diat language. Files can contain special text strings 
that tell EMACS which major mode to use in editing them. For 
example, -•-Lisp-*- anywhere in the first nonblank line of a 
file says that the file should be edited in Lisp mode. The string 
would normally be enclosed in a comment 

For editing English text, commands have been written to 
move the cursor by words, sentences and paragraphs, and to 
delete them: to fill and justify paragraphs: and to move blocks of 
text to the left or to the right. Other commands convert single 
words or whole regions to upper or lower case. There are also 
commands which manipulate the command strings for text 
justifier programs: some insert or delete underlining commands, 
and olhers insert or delete font-change commands. 

Many commands are controlled by parameters which can be 


used to further adapt them to particular styles of formatting. For 
example, ihe word moving and deletion commands have a syntax 
table that says which characters are parts of words. There are two 
commands to edit this table, one convenient for programs to use 
and an interactive one for the user. Ihe paragraph commands 
can be told which strings, appearing at the beginning of a line, 
constitute the beginning of a paragraph. Such parameters can be 
set by the user, or by a specification in the file being edited. But 
normally they are set automatically by the major mode (that is, by 
telling EMACS what language the file is written in) and do not 
require attention from the user. 


2.3. Redefining Self-Inserting Characters 

A very powerful extension facility is the ability to redefine the 
graphic and formatting characters as commands. These 
characters, which include letters, digits and punctuation, are 
normally all defined as commands to insert themselves into the 
text. Useful alternate definitions for these characters usually 
insert the character as usual, and then do additional processing 
which is in some way meaningfully associated with the insertion 
of that character. 

The single most useful command for editing text is the "auto- 
fill space". It is a program intended to be used as the definition of 
the space character. In addition to inserting a space, it breaks the 
line into two lines if it has become loo long. With the space 
character redefined in this way. the user can type endlessly 
ignoring ihe right margin, and the text is divided into lines of a 
reasonable length. Of course, this feature is not always desirable. 
It is turned on or off by redefining the space command. If the 
auto-fill space did not exist, any user could write it and also the 
command to turn it on and off. 

A bolder use of redefinition of self-inserting characters is the 
abbrev iation facility, part of the standard EMACS system but still 
implemented as an extension maintained by the user who wrote 
it. 'Ihe abbreviation facility allows the user to define 
abbreviations for words, and then type the abbreviations in order 
to insert the words. For example, if ”cd" were defined as an 
abbreviation for "command", typing "i/o-cd" would insert "i/o- 
command" into die text. Abbreviation expansion preserves case, 
so "Cd" would expand into "Command". Abbreviation works by 
redefining all punctuation characters (the list of which can be 
altered by customization) to run a program which looks at the 
preceding word and, if it is a defined abbreviation, replaces it 
with its expansion. 

Yet another application of redefining printing characters is 
automatic parenthesis-matching. When this feature is in use, 
every lime the user inserts a close-parenthesis, the cursor moves 
briefly to the matching open-parenthesis, then back again. 
Automatic matching is especially useful in editing Lisp code, but 
it is helpful with most other programming languages also. It is 
implemented by redefining Lhe close-parenthesis character. 


2.4. Editing Programs 

Extensibility is especially useful for editing programs. One 
might conceivably design in advance all the editing commands 
needed for editing English text, but each programming language 
has its own set of useful syntactic operations, which suggest useful 
editing commands. Because languages differ so much, simple 
customization is not in general enough to implement familiar 
operations for a new language. A new extension package is 
required. 

EMACS commands have been written, for many languages, 
to move over or kill balanced expressions, to move to the 
beginning or end of a function definition, and to insert or align 
comments. But the most useful editing operation for programs, 


148 


and the first one to be implemented for any programming 
language, is automatic indentation. 

The structure of a program can be made clear at a glance by 
adjusting the indentation of each line according to its level of 
ncstine. Most programming communities attempt to indent code 
properly but do Tl manually. Automatic indentation is used 
mostly by I.isp programmers. 

Automatic indentation was traditionally done by a program 
which would read in an entire source file, rearrange the 
indentation, and write out a corrected source file. Such a tool has 
several disadvantages. For one thing, processing the entire file is 
likely to take a while. For another, the tool insists on imposing its 
own idea of proper formatting, which the user cannot override. 
Even after a lot of effort is put into heuristics for good 
indentation, users are still dissatisfied. 

Automatic indentation in I'M ACS is done incrementally. 
The Tab character is redefined, as a command, to update the 
indentation of the current line only, based on the existing 
indentation of the preceding lines. The Tab command is used on 
lines whose nesting has changed. With it, the user can indent 
code properly as it is first typed in. If he does not agree with the 
Tab command’s choice of indentation, he can override it. 

Because the indentation function must understand the syntax 
of the programming language being edited, each language 
requires a separate indentation function. It is the job of the major 
mode for each programming language to redefine the lab 
character to run an appropriate indentcr. Users can always use 
the same command to indent, no matter what sort of program 
they are editing. In addition, another editing command can do 
indentation bv calling the current definition of Tab as a 
subroutine. (One such function is the one which indents several 
consecutive lines.) 

Conventions such as this are vital, in an extensible system, for 
enabling unrelated extensions to avoid interacting wrong; one 
user can write an indentation function for a new language, while 
another user writes new language-independent operations for 
requesting indentation, and the two automatically work properly 
together. 

I-anguages which have support for indentation include Lisp, 
Pascal, PL/I, Bliss, BCPL, Muddle and TECO. 

Comprehension of the user's program reaches its greatest 
heights for Lisp programs, because the simplicity of Lisp syntrnt 
makes intelligent editing operations easier to implement, while 
the complexity of other languages discourages their users from 
implementing similar operations for them. In fact, EMACS 
offers most of the same facilities as editors such as the Interlisp 
editor which operate on list stiucture, but combined with display 
editing. The simple syntax of I.isp, together with the powerful 
editing features made possible by that simple syntax, add up to a 
more convenient programming system than is practical with other 
languages. Lisp and extensible editors are made for each other, in 
this way. We will see below that this is not the only way. 


2.5. Editing Large Programs 

Iuirge programs are composed of many functions divided 
among many files. It is often hard to remember which file a given 
function is in. An EMACS extension called the TAGS package 
knows how to keep track of this. 

The TAGS package makes use of a file called a lag table, 
which records each function in the program, stating what file it is 
defined in and at what position in the file. The tag table is made 
bv running a special program named TAGS, which is not part of 
EMACS. Once the lag table is loaded into EMACS, the 


command Mcla-Period 2 finds the definition of any function, 
using the information in the tag table to select the proper file and 
find the function in it. 

The positions within the source file, remembered in tire tag 
table, arc used to find the function in the file instantly. Changing 
the file makes the remembered positions inaccurate. If this has 
happened. Meta-Period searches in both directions away from the 
remembered position until it finds the definition. So small 
inaccuracies cause only slight delays. 

When many new functions hare been added, or moved from 
one file to another, the TAGS program can reprocess the tag table 
into an updated one. To make this more automatic, the tag table 
also remembers which language each source file is written in. 
This information is needed for recognizing the function 
definitions in the file. 


2.6. Editing Other Things 

Interactiveness is useful in many activities aside from editing 
text. For example, reading and replying to mail from other users 
ought to be interactive. Many of these activities occasionally 
involve text editing: for example, editing the text of a reply. If a 
special editor is implemented for the purpose, it can easily be 
much more work to write than all lire rest of the system. It is 
easier to write the other interactive system within the framework 
of an extensible editor. 

EMACS has two extensions. RMAIL and BABYL, for 
reading mail. Commands in RMAIL and BABYL are not like 
EMACS commands: typical commands include "D" for "delete 
this message", and "R" for "reply to this message". Editing the 
text of the reply is done with ordinary EMACS commands. 

DIRED is used for editing a file directory. The normal 
editing commands, as extended, can be used to move the cursor 
throueh the directory listing. Other special commands defined 
only in DIRED delete, move, compare or examine the file whose 
name is under the cursor. 

The INFO extension is designed for reading tree-structured 
documentation files. These files are divided textually into nodes, 
which contain text representing pointers to other nodes. INFO 
displays one node at a time, and INFO commands move from one 
node to another by following the pointers. 


3. The Organization of the EMACS System 

The primary components of the EMACS system are the text 
manipulation and I/O primitives, the interpreter, the command 
dispatcher, the library system, and the display processor. 

The text and I/O primitives are used to operate on the text 
under the command of the program. The interpreter executes 
programs, using the primitives when called for. The command 
dispatcher remembers which program corresponds to each 
possible input character; it reads a character from the terminal 
and calls the associated function. The library system associates 
functions with their names and documentation, and allows groups 
of related functions to be loaded quickly together. The display 
processor updates the screen to match the text as changed by the 
text primitives; it is run whenever there is nothing else to do. 


/"Mela” is the name of a shift key on the ideal EMACS tenninal. On 
terminals which do not have this key, the ASCII character Escape is used as 
a prefix instead. 


149 



3.1 . Editing Language vs. Programming Language 

An EM ACS system actually implements two different 
languages, the editing language and the programming language. 
The editing language contains the commands users use for 
changing text. These commands are implemented by programs 
written in the programming language. When we speak of the 
interpreter, wc mean the one which implements the programming 
language, The editing language is implemented by the command 
dispatcher. 

Previous attempts at programmable editors have usually 
attempted to mix programming constructs and editing in one 
language. TECO is the primary example of this sort of design, it 
has the advantage that once the user knows how to edit with the 
system, he need only learn the programming constructs to begin 
programming as well. 

1 lowcver, there arc considerable disadvantages, because what 
is good in an editor command language is ugly, hard to read, and 
grossly inefficient as a programming language. A good interactive 
editing language is composed primarily of single-character 
commands, with a few commands that introduce longer names for 
less frequently used operations. As a programming language, it is 
unreadable. If the editor is to be customizable, the user must be 
able to redefine each character. This in a programming language 
would be intolerable! 

When the programming language is the editing language, the 
built-in editing commands and the primitive operations they use 
have to be written in another language. Then the user cannot 
change pan of the standard system slightly by making a small 
change to its definition: it has to be rcimplemcntcd from scratch 
as a macro. Since the primitives available arc only the commands 
he uses for editing, this will often be impossible because the 
necessary primitives will be internal routines that the user cannot 
call, l he primitives that an extension would like to use are not 
always the same as the editing operations the user wants. 

The implementor of a macro processor is encouraged to 
ignore such deficiencies because he himself does not use the 
language in implementing the rest of the system. Since it is 
traditional, in designing a macro language, to ignore the standards 
of readability, power and robustness typically applied to the 
design of programming languages, these deficiencies are usually 
considerable. The original TECO is a good example of this sort 
of problem. 

In EMACS, each language is designed for its purpose. The 
editing language has single-character redcfinablc commands. The 
programming language is TECO, modified and extended to be 
more suitable for writing well-structured and robust programs, 
and to provide the primitives needed by editing programs as 
opposed to editor users. It remains hard to read, so the 
descendents of EMACS generally use Lisp instead. TECO was 
used only for reasons of historical convenience. 

More information on the requirements extensibility imposes 
on the system's programming language is in the next chapter. 


3.2. The Library System and the Command Dispatcher 

An important part of any practical extensible system is the 
ability to use more than one extension at one time, and begin 
using an additional extension at any time. Extensions should be 
able to override or replace parLs of the standard system, or 
previous extensions. In EMACS the library system is responsible 
for accomplishing this. 

An EMACS library is a collection of function names, 
definitions and documentation that can be loaded into an 
EMACS in mid-session. Libraries are read-only and position- 
independent, so that they can be loaded just by incorporating 
them into the virtual memory of the EMACS. This allows all 
EMACS's using a library to share the physical memory. Each 
library contains its own symbol table which connects function 


names with definitions, and also with their documentation strings. 
Libraries are generated from source files in which each function 
definition is accompanied by its documentation: this encourages 
all functions to be documented. 

When a function name is looked up. all the loaded libraries 
are searched, most recently loaded first. For the sake of 
uniformity, the standard EMACS functions also reside in a 
library, which is always the first one loaded. Therefore, any 
library can override or replace the definition of a standard 
EMACS function with a new definition, which will be used 
everywhere in place of the old. Ibis, together with the fact that 
EMACS is constructed with explicit function calls to named 
subroutines at many points, makes it easy for the user to change 
parts of the system in a modular fashion without replacing it all. 

Subroutines are normally called by their full names. The user 
can also call any command by name, and many commands are 
primarily intended to be used in that way. However, the most 
common editing operations need to be more easily accessible. 
"ITiis is the purpose of the command dispatcher, which reads one 
character and looks it up in the dispatch table, a vector of 
definitions to find the function to be called (the definition-object, 
not the name). 

Functions residing in the dispatch table can be invoked either 
by the character command or by name. A function which does 
not appear in the dispatch table can be called only by name. The 
user calls functions by name by means of a single-character 
command (Meta-X) whose definition is to read the name of a 
function and call that function. 

Each user has his own patterns of use. Many functions in 
EMACS are accessible only by name because we expect most 
users to use them infrequently. If a particular user uses one such 
command often, he can place the definition in the dispatch table 
using the function Set Key. The function calling conventions arc 
designed so that almost any function definition will behave 
reasonably if called by the command dispatcher. If a function 
tries to read a string argument from its caller, then when called by 
the command dispatcher it will automatically prompt and read 
the argument from the terminal instead. 3 

Some libraries contain functions that are intended to be called 
with single-character commands, lhe library can arrange to place 
those functions' definitions in the dispatch table by defining a 
function called Setup. This will be called automatically when the 
library is loaded, and it can redefine character commands as 
needed. However, because EMACS is intended to be customized, 
no library can reasonably make the assumption that a function 
belongs on a particular character without allowing the user who 
loads the library to override that assumption. For example, a 
library might wish to redefine Control-S on the assumption that it 
invokes the search function, but a user might prefer to keep his 
search on Control-T instead, and he might prefer that same 
library to alter the definition of Control-T when loaded by him. 
The author of the library cannot anticipate the details of such 
idiosyncrasies, but he can provide for them all by following a 
convention: in the Setup function of the library (TAGS, say), he 
checks for a variable called TAGS Setup Hook, and if it exists, 
its value is called as a function instead of the usual setting up. 


^he process of reading lhe argument from lhe terminal is implemented 
by a function which the user can replace. 


150 


1 


3.3. The Display Processor 

ITic display processor is the part of EMACS which maintains 
on the display screen an up-to-date image of the text inside the 
editor. Since the size of the screen is limited, only a ponton or 
"window" can be shown. ihc display processor prefers to 
continue to start its display at the same point in the file, so as to 
minimize the amount of changes necessary to the screen. 
However, the text where the editor's own cursor is located must 
appear on the screen so that the terminal's cursor can show where 
it is. This sometimes forces a new window position to be 
computed. ’Ihc user can also command changes in the window 
position, moving the text up or down on the screen. 

Hie EM ACS display processor embodies an unusual principle 
which makes for much faster responses to the user: display 
updating has lower priority than cogitation. 

Most display editors change the display after each user 
command. This is die simplest strategy to implement, since each 
command knows precisely how it has changed the text. But it is 
very inefficient, not just of the computer's time, but of the user's 
time, because it makes the user wail for the completion of display 
updates that have already been made obsolete by further 
commands wailing to be executed. 

Here is an example of the problem. If the user types Carriage 
Return to create a new line, all the lines below that point need to 
be redisplayed in their new positions. While this is still going on, 
if he typesan additional Carriage Return to create another new 
line, the rest of the display update is obsolete; there is no use 
displaying the rest of the lines in their second positions, only to 
display them again in their third positions. 

The ILM ACS display processor is best understood as being a 
separate, lower priority process that runs in parallel with the 
editing process. The editing process reads keyboard input and 
makes changes in die text. Hie display process is always trying to 
change the screen to match the text; it keeps a record of what is 
on die screen, and in each cycle of operation finds one 
discrepancy between the editing buffer and the screen record and 
corrects it.’ After each cycle, the display process can be pre- 
empted by the editing process, which has higher priority. The 
display process can be" thought of as chasing an arbitrarily moving 
target, the edited text, with a speed limited by the terminal baud 
rate. 

Multiple processes arc not actually used in the 
implementation. Instead, after each line of display output, the 
display processor updates its data base and polls for input 

An additional benefit of this input-before-output philosophy 
is that it uses less computer resources when the system is heavily 
loaded. When not enough computer power is available, EMACS 
gets behind in processing the user's input. When the first 
command is completed, more input is available, so no effort is put 
into display updating yet By saving computer time this way, 
EMACS eventually catches up with the user and does its display 
updating all at once. 

Since display updating is not necessarily done at the same 
time as the editing operation which necessitates it, display 
updating cannot be the responsibility of the editing command 
itself. Instead, the display update must be done by somehow 
comparing the new text with the previous displayed text, or 
information about it. In EMACS, each editing command returns 
information on the range of text it has changed, but aside from 
that the display processor operates independently. This is good 
for extensibility as well: it is easier (o write or change an editing 
command if it does not have to contain algorithms for updating 
the screen. 


‘’This particular sequence of events poses no problem on terminals 
which can move text up and down on the screen. But the same problem 
can still result from other events. 


Because the TTC'O language is not very efficient, the display 
processor had to be wrillcn in assembler language to get adequate 
performance. Ihis is unfortunate because extensions to the 
display processor could be very valuable. In later 
implementations of EMACS. Ihc display processor is written in 
I isp along with the editing commands, and ran be extended. 


4. Extensibility and Interpreters 

Despite its syntactic obscurity. Ih.CO is actually one of the 
best languages to use for implementing an extensible editor. This 
is because most traditional programming languages simply cannot 
do the job! Implementing an extensible system of any sort 
requires features that they intrinsically lack. Specifically, it 
requires a language with an interpreter and the ability for 
programs to access the interpreters data structures (such as 
function definitions). 

Adherents of non-Lisp programming languages often 
conceive of implementing an EMACS for their owm computer 
system using PASCAL, PL/I, C, etc. In fact, it is simply 
impossible to implement an extensible system in such languages. 
'Ihis is because their designs and implementations arc batch- 
oricnied : a program must be compiled and then linked before it 
can be run. An on-line extensible syslent must be able to accept 
and then execute new 1 code while it is running. This eliminates 
most popular programming languages except I. isp, APL and 
Snobol. At the same time. Lisp's interpreter and its ability to 
treat functions as data are exactly what we need. 5 

A system written in PL/I or PASCAL can be modified and 
recompiled, but such an extension becomes a separate version of 
die entire program. The user must choose, before invoking the 
program, which version he wants. Combining two independent 
extensions requires comparing and merging the source files. 
These obstacles usually suffice to discourage all extension. 

The onlv way to implement an extensible system using an 
unsuitable language, is to write an interpreter for a suitable 
language and then use that one. Prime is now implementing an 
EMACS using a simple Lisp written in PL/I. This technique 
works because an cdilor docs not require a very efficient 
interpreter; even the most straightforward Lisp interpreter is 
more efficient than die TECO interpreter which is empirically 
observed to be good enough. I would not regard this as 
implementation "in" the original language, however. 

A PASCAL or PL/I implementation which uses an 
interpreter, and allows the user program to access the interpreter 
data structures sufficiently, could be used just as a Lisp 
implementation would be used. However, such implementations 
are very rare, because these languages are not designed for them. 
If the implementor appreciates the importance of the interpreter, 
and of treating functions as data, he will usually choose to 
implement Lisp. 

It is also possible to use dynamic linking — the ability to load 
additional modules of compiled code during execution, and refer 
to subroutines therein by name — in place of an interpreter. 
However, dynamic linking operating systems arc rarer than good 
Lisps, harder to implement, and not as convenient for the job. 
One of the few such operating systems, Multics, has an EMACS 
written in i.isp. SINE, the EMACS implementation on Interdata 
computers, uses dynamic linking to load files compiled from a 
language which resembles Lisp. 


5 It is o.k. to use a Lisp compiler, if there is one. What counts is not using 
the interpreter all the lime, but having it available all the time. 


151 



5. Language Features for Extensibility 

When a language is used for implementing extensible 
systems, certain control structure and data structure features 
become vital. 


5.1 . Global Variables 

One difference between Lisp (and TECO) and most other 
programming languages, which is very important in writing 
extensible systems, is that variable names are retained at run time; 
they are not lost in compilation. 

In typical compiled languages, variable names are meaningful 
only at compile lime. In the compiled code, uses of one variable 
name become references to one location in memory, but the name 
itself has been discarded. 

By contrast, Lisp remembers the connection between variable 
names and their values, so that new programs can be defined. 

Global variables are essential for parameters used for 
customization. KM ACS has a variable named Comment Start 
which controls the string recognized as starting a comment in the 
text being edited. Its value is supposed to be that string. This 
variable is used by the comment indenting command to recognize 
an existing comment. The fact that the variable name is known at 
run time enables the user to 

- ask to see the value of the string. 

- change the string. 

- define or redefine major modes, for various programming 

languages, which change the string. 

- define or redefine comment-manipulation commands 

which refer to the variable so that they will work on text in 

various languages. 


5.2. Dynamic Binding 

Most batch languages use a lexical scope rule for variable 
names. Each variable can be referred to legally only within the 
sy ntactic construct which defines the variable. 

Lisp and TECO use a dynamic scope rule, which means that 
each binding of a variable is visible in all subroutine calls to all 
levels, unless other bindings override. For example, after 

(defun fool (x) (foo2)) 

(defun foo2 () (+ x 6)) 

dien (fool 2) returns 7, because foo2 when called within 
fool uses fool's value of x. If f oo2 is called directly, however 
it refers to the caller's value of x, or thc.elobal value. We say that 
fool binds the variable x. All subroutines called by fool see 
the binding made by fool, instead of the global binding, which 
we say is shadowed temporarily until fool returns. 

In PASCAL the analogous program would be erroneous, 
because f oo2 has no lexically visible definition of x. 

Dynamic scope is useful. Consider the function Edit 
Picture, which is used to change certain editing commands 
slightly, temporarily, so that they are more convenient for editing 
text which is arranged into two-dimensional pictures. For 
example, printing characters are changed to replace existing text 
instead of shoving it over to the right. Edit Picture works by 
binding the values of parameter variables dynamically, and then 
calling the editor as a subroutine. The editor "exit" command 
causes a return to the Edit Picture subroutine, which returns 
immediately to die outer invocation of the editor. In the process, 
the dynamic variable bindings are unmade. 

Dynamic binding is especially useful for elements of the 
command dispatch table. For example, the RMAIL command 


for composing a reply to a message temporarily defines the 
character Control-Meta-Y to insert the text of the original 
message into the reply. The function which implements this 
command is always defined, but Control-Meta-Y docs not call 
that function except while a reply is being edited. The reply 
command docs this by dynamically binding the dispatch table 
entry for Control-Meta-Y and then calling the editor as a 
subroutine. When the recursive invocation of the editor returns, 
the text as edited by the user is sent as a reply. 

It is not necessary for dynamic scope to be the only scope rule 

provided, just useful for it to be available. 


5.3. Variables Local to a File 

Suppose one file is formatted with comments starting at 
column 50. Editing this file is easier if the variable Comment 
Col umn, which is used (by contention) to decide where to align 
comments, is always set to 50 whenever this file is being editing 
EMACS provides a way to request this; but since it also provides 
the leature of visiting several files at once, it must take special 
care to keep each file's variables straight. Suppose one file wants 
Comment Column to be 50 while another is formatted with 40? 

ThU is solved by allowing each file to have its own local 
values for any set of variables. Specially formatted text at the end 
of the file specifies them: 

Local Modes: 

Comment Column:60 

End: 

When a file is brought into EMACS. this local modes list is 
parsed and the variables and values remembered in a local 
symbol table. While the file is not selected, its local symbol table 
contains the local values of the variables. While a file is selected, 
ns local symbol table contains the global values, and the real 
symbol table contains the file’s local values instead. 


5.4. Hooks 

When an extensible system allows the user to provide a 
function to be called on certain well-defined occasions, we call it a 
hook. For example, we have already mentioned the hook which 
is executed whenever a certain library is loaded; for the TAGS 
library, the hook is named TAGS Setup Hook. 

Another important class of hooks is executed when a major 
mode is entered. Each major mode has its own hook. For 
example. Text mode’s hook is named Text Mode Hook. This 
hook can be used to request arbitrary actions in advance for each 
time text mode is entered. Many users always define this hook to 
turn on Auto Fill mode, so that Auto Fill mode is always on when 
text mode is. 

Hooks can be associated with variables as well. Then, each 
time the value of the variable changes, its hook is run. Usually 
these hooks are used to change other data structures so that they 
always correspond to the value of the variable. This is often more 
efficient and more modular than checking the variable itself 
whenever its value is relevant. For example, changing the value 
of Auto Fill Mode to turn auto-filling on or off calls a 
function which automatically redefines the Space character’s 
command definition. 

Some hooks are attached to specific points within the 
interpreter or display processor. For example, there is a hook 
which is called whenever it is time to read a character of input 
from the terminal. The hook program can supply the character 
itself lhese hooks can be thought of as compensating for the fact 
that some pans of the system are written in assembler language 
and cannot simply be redefined by the user. 


152 


w 




lal 

his 

all 

ly 

)le 

a 

is, 

he 


at 

it 

in 

g. 

es 

al 

as 

al 

id 


is 

al 

e 

i, 

al 


a 

a 


h 

•S 


ir 

ir 

is 

h 

10 

n 

:h 

y 

■y 

e 

If 

e 

a 

's 

e 

k 

it 

■r 

h 


5.5. Errors and Control Structure 

A system for programming editor commands needs more 
sophisticated facilities for handling errors and other exceptional 
conditions than most programming systems provide. Let us 
consider what an error is, and what ought to happen when there is 
an error. 

First of all. what exactly is an error 1 Sometimes the user asks 
to do something that cannot be done (a user error). Sometimes a 
program asks to do something which cannot be done (a program 
error). Program errors often accompany user errors, but either 
one can happen without the other. 

Program errors can be defined objectively: any event which 
executes a certain part of the interpreter is a program error. User 
errors cannot be defined objectively in this w'ay because they are a 
matter of attitude toward events rather than events themselves. If 
a command has done nothing, we can regard this either as the 
response to an error or as normal functioning. And this choice of 
attitude has no necessary connection with whether the command 
definition required special code to make it do nothing in the 
circumstances in question. 

When a program error happens, F.MACS prints the error 
message and then gives the user the chance to invoke the error 
handler to debug it. If he does not do this, control returns to the 
innermost error return point. Programs can create error return 
points with a special construct. (We use a Lisp-style syntax in 
these examples for clarity). 

(error-return 

(arbitrary-code-here)) 

The end of the error-return construct becomes an error return 
point which is in effect while tire code inside the construct is 
being executed. Error returns are usually used by loops which 
read and execute commands of some sort, including the built-in 
one which reads and displays editing commands. 

(do-forever 

(error-return 

(read-and-execute-one-command) ) ) 

Sometimes inteipreted functions are called asynchronously or 
unpredictably. An example is the one which optionally saves the 
text every so often to reduce the amount lost if the system crashes. 
If this function gets a program error, it should notify the user, but 
should not interfere in any way with the user’s explicit 
commands. This requires a construct known in Lisp as errset, 
which prevents all normal processing of errors that occur within 
it. An error occurring within an errset does nothing but return 
control immediately to the end of the errset. 

The programming system does not provide any such uniform 
handling for user errors because the concept of a user error is not 
defined at that level. Instead, the designer of each editing 
command must decide what conditions ought to be considered 
errors, and what to do in each case. Sometimes the command 
simply does nothing. Sometimes it rings the terminal’s bell and 
perhaps throws away type ahead. This can be best if we expect 
that, once the user is told that there is something wrong, it will be 
obvious what it is. When the cause of the error is less obvious, 
causing a program error deliberately with a specially chosen error 
message is a good way of informing him. A special primitive is 
used to cause a program error with an arbitrary specified error 
message so that the error-return processing can be invoked. 

Sometimes the user error leads naturally to an error in the 
program, which may be all the handling it needs. This can be so 
if the program error's error message is an adequate explanation 
for the user, or if the situation is not deemed likely enough to 
deserve the effort required to make anything else happen. 

The error handler for debugging program errors is an 
interpreted program itself. This is possible because primitives are 
provided for examining the function call stack and all other data 
structures which the programmer would want to examine while 


debugging. Users have actually written extensions and complete 
replacements for the standard error handler program. 


5.6. Non-local Control Transfers- 

Returning to the example of the user-written command loop, 
there has to be a command to exit the loop. I low can it be done? 

(do-forever 
( error-return 

( read-and-execute-one-command) ) ) 

We do it by means of a non-local control transfer. We create 
the transfer point by means of a catch construct around the loop. 
The catch creates a named transfer point al the end of the loop, 
which is accessible only within the loop. 

(catch 

(do-forever 

(error-return 

(read-and-execute-one-command) ) ) 
exit-my-loop) 

At any time during the loop, execution of (throw exit- 
my-loop) transfers conlrol immediately to the end of the catch, 
thus exiting the loop. The catch and throw constructs were 
copied from Maclisp. 

Like variable names, catch names have dynamic scope: the 
program can throw to a catch from any of the subroutines called 
while inside the catch. This is important because ease of 
extension dictates that each command which the command- 
reading loop understands be implemented by a separate function, 
so that the user can redefine one command without replacing the 
framework of the loop. 6 


6. Self-Documentation and Extensibility 

A complex program is much easier to learn if it can answer 
questions about how to use it. When the program is 
customizable, it is important for the answers to reflect any 
customization that has been done. The easiest way to do this is 
for questions to be answered based on the same tables and data 
stmetures that control the functioning of the system. In F.MACS, 
these include the command dispatch table and the loaded 
libraries. 

The most basic kind of question that a user might want to ask 
is, "What docs this command do?" lie can inquire about either a 
function name or a command character. A library contains a 
documentation string for each function in it, and this is used to 
answer the question. When the question is about a command 
character, the dispatch table is used to find the function object 
which is currently the definition of that character. Then the 
library system is used to find the name of the function, and then, 
from that, the documentation string. 

The ability to ask what a certain command does, only helps 
users who know what commands to ask about. Other users need 
to ask, "What commands might help me now?" EMACS 
attempts to answer this by listing all the functions whose names 
contain a given substring. Since the function names tend to 
summarize what the functions do (such as "Forward Word" or 
"Indent for Comment") and follow systematic conventions, this is 


6 Nomially the command reading loop uses tire name of the command to 
compute the name of the function to call. For example, if RMA1L reads 
the letter N as a command, it calls the function # RMAI L H. This way the 
user can easily define new commands. 


153 




V 


usually enough. The list also contains the first line of each 
function's own documentation, and how to invoke the function 
with one or two characters, if that is possible. 

Ihe documentation for a function is usually just a string of 
text, but it can also contain programs to be executed to print'the 
documentation, interspersed with text to be printed literally. This 
comes in handy when the description of one function refers to 
another function which is usually accessed as a one or two 
character command It is better to tell die user the short 
command, which he would actually use. than the name of the 
function which defines it. But exactly w hich command — if any — 
runs the function in question depends on the users 
customization. What we do is to use a program, in the middle of 
the documentation string, which searches the dispatch table and 
Prints the command which would invoke the desired function. 
Another application of this facility is for functions which simply 
load a library and call a function in it. Ihe documentation string 
for such functions is a program to load the library and print the 
documentation of the function which would be called. 

To help users remember how to ask these questions, we make 
il simple and standard. A special character, called the Help 
character, is used. This character is only used for asking for help, 
and is always available. Help is normally followed by another 
character which specifies the type of inquiry. If the user docs not 
remember these characters, he can ty pe Help again to see a list of 
them. To close the remaining loophole of confusion. EMACS 
Prints a message about the I Iclp character each time it starts up. 

I Iclp is also available in the middle of ty ping a command. 
For example, if you start to type tile Replace String command 
and forget what arguments are required, type Help. The 
documentation of the Replace String function will be printed to 
tell you what to do next 

Because questions are answered based on the data structures 
as they are at the moment, many changes in EM ACS require no 
extra effort to update the documentation. It is only necessary to 
update the documentation of each function whose definition is 
changed. The format for EM ACS library source files encourages 
this by requiring a documentation string for every function, 
between the function name and its definition. 


7. History 

I began the development of EM ACS in 1974 with an 
improvement to TliCO: the implementation of the display 
processor and a command dispatcher with a small fixed set of 
commands. 'Ihese were inspired by the editor F of the Stanford 
Artificial Intelligence Lab. They were not considered a new 
editor, but rather one new feature in TFCO to join many existing 
features. Ihe user would give the TFCO command Conlrol-R to 
enter display editing mode, whose commands were suitable only 
for making local changes to the file. He would exit display 
editing mode to do anything else. 

But once display editing was implemented, it was fairly easy 
to allow commands to be redefined to call functions written in 
TliCO. TFCO already contained considerable facilities for text 
manipulation, I/O, and programming, so almost immediately 
many users began to implement large collections of editing 
commands, powerful enough to do every part of editing. One of 
the most popular of these systems was TFCMAC Others 
included MACROS, RMODE, IA1ACS. Russ-mode and DOC. 
The need to exit from display editing mode to use TFCO directly 
became less and less frequent until new users no longer learned 
how. 

But TFCO was still missing many of the important control 
and programming constructs which allow programs to be 
readable and maintainable (for example, named functions and 
variables!). So the early I HCO-based display editors were very 
hard to maintain. In 1976 ihc TMACS system experimented with 


adding named funclions and variables, with good results limited 
by the inefficiency of implementing them with TFCO programs. 
Ibis inspired me to implement FMACS itself. 

Writing FMACS involved simultaneously adding to TFCO 
the features which make up the library system and self- 
documentalion. which permitted a new readable programming 
style, and writing a new set of display editing commands using 
this style. The design for the commands themselves was based on 
examining die command sets of the many TFC'O-based editors 
lor inspiration, and choosing commands so that the most common 
operations would lake few keystrokes. Ihe first operational 
FMACS system existed in late 1976. 

Since then, development has proceeded steadily, most new 
code being written in TFCO. New features are added to TFCO 
itself only to speed up loops such as table searching and s- 
expression parsing, or to make possible new kinds of I/O or 
interface operations. 

FMACS was developed on the Digital Equipment 
Corporation PDP-10 computer using MIT's own Incompatible 
limesharing System. By 1977, outside interest in FMACS was 
sufficient to motivate Mike McMahon of SRf International to 
adapt it to Digital's 1 wenex ("Tops- 20") operating system. 
FMACS is now in use at about a hundred sites. 


7.1 . Successors of EMACS 

Several post-IiMACS editor implementations have copied 
from FMACS both the specific command set and user interface 
and the fundamental principle of being based on a programmable 
interpreter, llic motivation for these projects was to transfer the 
ideas of I. MAC'S to other computer systems. Two of them, now 
in use. are Multics FMACS. a i ioneywcll product, and ZWEI, the 
editor lor die MIT Artificial intelligence Lab Lisp machine. 

Because FMAC'S supplied the implementors with a clear idea 
of what was to be implemented, their focus was on making the 
foundations clean. The essential improvement was the 
substitution of an excellent programming language, L isp, for the 
makeshift extended ll-.CO used in FMACS. Lisp provides the 
necessary language features in a framework much cleaner than 
I ECO; Also, it is more efficient. A Lisp interpreter is 
intrinsically more efficient than a string-scanning interpreter such 
as TECO's. and L.isp compilers are also available. This efficiency 
is important not just for saving a few microseconds, but because it 
reduces the amount of the system which must be written in 
assembler language in order to obtain reasonable performance. 
This opens more of the system to user extensions. Another 
improvement has been in the data structure used to represent the 
editing buffer: Multics EMACS developed the technique of 
using a doubly-linked list of lines, each being a string. This 
technique is used in ZWlil as well. 

Many other editors imitate the EMACS command set and 
display updating philosophy without providing extensibility. 
Despite that deficiency, and despite Ihe greatly reduced set of 
features that results from it, these can be useful editors, though 
not as useful as an extensible one. For a computer with a small 
address space or lacking virtual memory, this is probably the best 
that can be done. 7 

The proliferation of such superficial facsimiles of EMACS has 
an unfortunate confusing effect: their users, knowing that they 
are using an imitation of EMACS, and never having seen 
1-MACS itscll, are led to believe that they arc enjoying all the 
advantages of EMACS. Since any real-time display editor is a 


The standard ILMACS system is bigger than the entire 64k-byte address 
space of the PDP-11, despite constant strenuous efforts to reduce its size. 
And i ECO is equally large. The puslT.MACS editors arc even larger. 


154 


p 


grams. 

ieco 

. self- 
mning 
using 
cd on 
ditors 
union 
lional 

i new 

ieco 

md s- 

0 or 

inent 

itible 

1 was 
ml to 
stem. 


'Pied 
rface 
table 
r die 
now 
, the 


Hca 

r die 
t the 
than 
:r is 
.such 
zncy 
ise it 
a in 
nee. 
iher 
the 
: of 
fhis 

and 
lity. 
t of 
ugh 
nail 
Pest 

lias 

hey 

ecn 

the 

is a 


ess 

ize. 


tremendous improvement over what they probably had before, 
they believe this readily. To prevent such conTusion, we urge 
everyone to refer to a nonextcnsible imitation of EMACS as an 
"Ersatz EMACS". 


8. Notes 


8.1 . EMACS Distribution 

EMACS is available for distribution to sites running die 
Digital Equipment Corporation Twenex ("Tops-20") operating 
system. It is distributed on a basis or communal sharing, which 
means that all improvements must be given back to me to be 
incorporated and distributed. Those who are interested should 
contact me. Further information about how EMACS works is 
available in the same way. 


8.2. Further Information 

An expanded version of this paper is available as 

Richard M. Stallman. EMACS: The Extensible. 
Customizable Display Editor, Artificial Intelligence 
Lab memo 519a, 1981. 

A complete manual for use (but not extension) of EMACS is 
Richard M. Stallman, EMACS Manual for ITS Users, 
Artificial Intelligence Tab memo 554, 1980. 

Richard M. Stallman, EMACS Manual for TWENEX 
Users, Artificial Intelligence Lab memo 555, 1980. 

Various lower level implementation strategies for parts of an 

EM ACS-like editor are treated in 

Craig A. Finscth, Theory and Practice of Text Editors, 
or, A Cookbook for an Emacs, L.C.S. Technical Memo 
TM-165, B.S. Thesis, May 1980. 


8.3. EMACS- related Editors 

These include the true extensible desccndents of EMACS, 
and the editors which preceded EMACS and supplied some of 
the ideas for it. The many ersatz EMACS editors are not 
included. 

Multics EMACS 

Multics EMACS was written in MacLisp by Bernard 
S. Greenberg of Honeywell's Cambridge Infomiation 
Systems Lab, starting in 1978. Because it is written in Lisp, 
Multics EMACS is even more extensible than the original 
EMACS, and as a result it has accumulated even more 
powerful features. 

Bernard S. Greenberg, Prose and CONS (Multics 
Emacs: a commercial text processing system in Lisp), 
in proceedings, 1980 Lisp Conference, Stanford 
University, Stanford, California, August 1980. 

Bernard S. Greenberg, and Katie Kissel, Multics 
Emacs Text Editor User’s Guide, Publication #C1I27, 
Honeywell Information Systems, Waltham, Mass., 
1979 

Bernard S. Greenberg, Multics Emacs Extension 
Writers' Guide, Publication #CJ52, Honeywell 
Information Systems, Waltham, Mass., 1980 

SINE 

SINE ("SINE Is Not EMACS") is based on compiling Lisp 
code to run in a non-I.isp editor environment, in which, 


unfortunately, no interpreter is present. However, the user 
can load his own compiled files into a running editor. This 
design was chosen because of the small address space of the 
machine, an Interdata at the Mil Architecture Machine 
Group. See 

Owen T. Anderson, The Design and Implementation 
of a Display-Oriented Editor Writing System, 
Undergraduate 'Diesis, MIT Physics Department, 
January 1979. 

TECMAC 

TECMAC was the first editor implemented in TECO to 
work with the displav processor. It developed many of the 
ideas used in the EMACS user interface. It was retired 
because, written when iECO was less suited to system 
programmine. it was unable to attain cither readability or 
efficiency. TECMAC was maintained from 1.974 to 1976 by 
John L. kulp and Richard L. Bryan. 

TECO 

PDP-10 TECO was originally written by Richard 
Grcenblalt. Stew Nelson and Jack Holloway at the MIT 
Artificial Intelligence Lab. based on PDP-1 TECO which 
was written by Murphy in 1962. The TECO in which 
EMACS is implemented is its direct descendant. Ihe PDP- 
10 TECO from Digital, a typical example orTECO, is also a 
descendant of an early version from Nil f. It is documented 
in 

Digital Equipment Corporation, Decsystem-10 TECO 
Programmer's Reference Manual, DEC-10-ETEE-D 
(revised from time to time). 

Ordinary TECO lacks many important programming 
constructs. In MIT TECO, the constructs may be 
syntactically ugly, but they exist. So programs can be well 
organized, and clean except in the lowest level of detail. 

TMACS 

TMACS was an editor implemented in TECO which began 
to develop the idea of the sharable library with commands 
that could be assigned to keys by the user. 1 MACS was the 
project of Dave Moon, Charles Frankslon. Earl A. Killian, 
and Eugene C. Ciccarelli. Interestingly, it had no standard 
command set. The implementors were unable to agree on 
one, which is what motivated them to work on making 
customization easier. 

ZWEI 

ZWIT ("ZWEI Was FINE Initially") is the editor for the 
Lisp machine. EINE ("EINE Is Not EMACS"), the former 
editor for the Lisp machine, was also based on EMACS; it 
was operational for late 1977 and 1978, and was redone to 
make it cleaner. Both EINE and ZWEI are primarily the 
work of Daniel Weinrcb and Mike McMahon; see 

Daniel L. Weinreb, A Real-Time Display-Oriented 
Editor for the LISP Machine, Undergraduate Thesis, 
MIT EECS Department, January 1979. 


8.4. Other Interesting Editors 
Augment 

Augment (formerly known as NLS) is a display editor whose 
interesting feature is its ability to structure files into trees. 
Making the tree structure useful required the concept of the 
vicwspcc, which specifies that only certain levels in the tree 
structure will be visible. Augment was designed at SRI 
International but is now supplied by Tymshare. See 

Douglas C. Engclbart and William K. English, A 
Research Center for Augmenting Human Intellect, 
AEIPS Conference Proceedings, Vol. 33, Fall Joint 
Computer Conference, San Francisco, December 1968, 




155 





pp. 395-410. 

Patricia B Scybold. TY.MSIIARE'S AUGMENT— 
Heralding a New Era. The Seybold Report on Word 
Processing. Vo]. L. No. 9, October 1978. 16 pp. (ISSN: 
0160-9572). Sevbold Publications, Inc., Box 644, 
Media. Pa 19063. 

Bravo 

Bravo comes from the Xerox Palo Alto Research Center. Its 
orientation is toward text formatting, and it can display 
multiple fonts, underlining, etc. It makes heavy use of a 
graphical pointing device, the "mouse" (see Augment). It is 
not programmable and offers no special help for editing 
programs as opposed to text. Tor more information, see 
your local industrial espionage agent 

E 

fhe editor used at the Stanford Artificial Intelligence Lab, E 
interfaces with a "line editor" (used to edit within a line, on 
a display terminal) which can also be employed to edit the 
input to any other progtam. The line editor does not allow 
commands to be redefined: since it is part of the timesharing 
system, that is not trivial (though possible in principle). See 
tiie on-line documentation file E.ALS[UP,DOC] of the 
Stanford Artificial Intelligence Laboratory. 

TRIX 

TRIX is a language similar to TRAC designed at Lawrence 
Livermore Lab specifically for writing editors. It has been 
used to write commands that are specific to particular 
languages, and to write text formatters. Its fatal flaw is that 
it was designed for printing terminals. See 

Cecil. Moll and Rinde, TRIX AC: A Set of General 
Purpose Text Editing Commands, Lawrence 
Livermore Lab UCID 30040, March 1977. 





8.5. Other Related Systems 
The Lisp Machine 

The MIT Artificial Intelligence Laboratory has built a 
machine specifically for the purpose of tunning large Lisp 
programs more cheaply than ever before. One of its goals is 
to make the entire software system interactively extensible 
by writing it iri Lisp and allowing the user to redefine the 
functions composing the innards of the system. Part of the 
system is an EMACS-likc editor (ZWEI; see above) written 
entirely in Lisp, which shares in this extensibility. See 

Daniel Weinreb and Dave Moon, The Lisp Machine 
Manual, MIT Artificial Intelligence laboratory. 

MacLisp 

Tire MacLisp language is very suitable for writing extensible 
interactive programs, and has been used for the 
implementation of Multics EMACS. See 

Dave Moon, MacLisp Reference Manual, MIT 
Laboratory for Computer Science, 1974. 

Smalltalk 

Tlie Smalltalk language and system is oriented toward 
writing extensible programs. 

Dan H. LI. Ingalls, The Smalltalk-76 Programming 
System Design and Implementation, in proceedings, 
Fifth Annual ACM Symposium on Principles of 
Programming Languages. 


156 



1 


An Annotated Bibliography of Background Material 
on Text Manipulation 

Brian K. Reid 
Stanford University 

David Hanson 
University of Arizona 


This is the first ACM conference specifically devoted to 
text manipulation, but there have been good papers and 
books on the topic published in the past. To help define 
the state of the field as of the time of this first conference, 
we have assembled this small annotated bibliography 
listing classic or important past work on text manipulation, 
including material on text editing, document formatting, 
typography, graphic communication, writing style, siring 
and pattern matching, and other problems of interest to 
researchers in this field. Ben Schnciderman of the 
University of Maryland and Chris Fraser of the University 
of Arizona have provided us with some of the annotations 
of papers on text editing. 

In a bibliography like this one it is impossible to 
include every relevant paper, or even every important 
paper. We in general have chosen to include those that 
have been influential, are widely available, and are also 
reasonably current. We have additionally included a 
number of papers and books that supply background 
knowledge about relevant applications areas that might 
not be widely known to computer scientists. 

[Anderson 71] M. D. Anderson. 

Cambridge Authors’ and Printers' Guides. 

Volume 6: Book Indexing. 

The University Press, Cambridge, 1971. 

A small booklet by an expert indexer, explaining the 
rudiments of making indexes for ordinary books. 
Required reading for implementors and users of 
automated indexing mechanisms in document formatters. 


Permission to copy without fee all or 
part of this material is granted provi- 
ded that the copies are not made or 
distributed for direct commercial advan- 
tage, the ACM copyright notice and the 
title of the publication and its date 
appear, and notice is given that copy i ng 
is by permission of the Association for 
Computing Machinery. To copy otherwise, 
or to republish, requires a fee and/or 
specif i c permission . 

c 1981 ACM 0-89791X143-5/81/0600/01 57 S00.75 


[Badre 77] N. A. Badre and C. H. Thompson. 

Yorktown Mathematical Formula Processor 
User’s Guide 

IBM T. J. Watson Research Center, 
Yorktown Heights NY, 1977. 

The User’s Guide for a widely-used program to typeset 
mathematical formulas. 


[Barnett 65] M. P. Barnett. 

Computer Typesetting: Experiments and 
Prospects. 

MIT Press, 1965. 

This book describes the early work at M.l.T. by Barnett at 
al. on interfacing computers to typesetters. Much of the 
material is dated, but a lot of it is still relevant. 


[Berg 78] N. Edward Berg. 

Electronic Composition. 

Graphic Arts Technical Foundation, 
Pittsburgh, 1978. 

An exhaustive survey of computer text editing and 
typesetting from the perspective of the printing industry. 
This huge and expensive book describes the state of the art 
from the point of view of the commercial typesetter or 
publisher. 

[Burt 59] Sir Cyril Burt. 

A Psychological Study of Typography. 
Cambridge University Press, 1959. 

The classic study on the legibility of printed material. This 
is the same Cyril Burt who was later found to have faked 
his data on studies of identical twins, but this work has 
stood tire test of time. 


157 






[Card 78] Sluart Card. 

Studies in the Psychology of Computer Text 
Editing Systems. 

Ph.D. Thesis, Carnegie-Mellon University, 
1978. Published as Xerox Palo Alto 
Research Center Report SSL-78-1. 

A fine Ph.D. dissertation that includes benchmark 
performance data, attempts at predicting user 
performance times, a theoretical model, an experimental 
evaluation of cursor movement devices, and a simulation 
of user performance on a display editor. 

[Card 80] Stuart Card, Tom Moran, and Allen Newell. 

"The Keystroke-Level Model for User 
Performance Time With Interactive 
Systems." 

Communications of the ACM 23(7), July 
1980. 

By decomposing an interactive terminal session into unit 
tasks the authors claim to be able to predict user 
performance time. Only error-free performances by 
experienced users are addressed. A theoretical model and 
experimental results are presented. 

[Chicago 69] A Manual of Style. Twelfth Edition, Revised 

The University of Chicago Press, 1969. 

The definitive manual of style for writing and printing. 

[Coulouris76] G. F. Coulouris, I. Durham, J. R. 

Hutchinson, M. H. Patel, T. Reeves, 
and D. G. Winderbank. 

The Design and Implementation of an 
Interactive Document Editor. 

Software — Practice and Experience 6:271- 
279, June, 1976. 

An early and well-written short paper on an 
editor/formatter done at Qunne Mary College, London. 


[Englebart 68] Douglas C. Englebart and William 
K. English. 

"A research center for augmenting human 
intellect". 

AFIPS Conference Proceedings, Fall Joint 
Computer Conference, 1968, Volume 
33. 

The NLS editor produced as part of this project was the 
pioneering structure editor. NLS (originally named 
Hypertext) has influenced the design of a whole genera- 
tion of text editors. 


[Fleck 78] A. C. Fleck. 

Formal Models for String Patterns. 
In Current Trends in Programming 
Methodology, Volume IV: Data 
Structuring, R. T. Yeh, Editor. 
Prentice-Hall, 1978. 

Pattern theory. 


[Gimpel 76] J. F. Gimpel. 

Algorithms in SNO BOM. 

John Wiley and Sons, New York, 1976. 

The algorithms used in the SNOBOL4 string processing 
language. 


[Goldfarb 78] Charles Goldfarb. 

Document Composition Facility: 

Generalized Markup Language (GM L) 
User's Guide. 

Publication SH20- 9160-0, IBM General 
Products Division, 1978. 

The user’s manual for IBM’s Document Composition 
Facility, a very sophisticated generic text formatter. 

[GPO 73] U.S. Government Printing Office Style 
Manual 

Revised edition, Washington, D.C., 1973. 

The U.S. Government's style manual. Parts of it are 
irrelevant to anyone outside the government, but it covers 
a different range of material than the Chicago manual. 
The GPO is primarily interested in "utility-grade” type- 
setting, and this manual reflects that interest. 


[GPO 76] Word Division Supplement to the 

Government Printing Office Style 
Manual 

Seventh edition, Government Printing 
Office, Washington, D.C., 1976. 

English-language hyphenation dictionary. 


[Griswold 71] Ralph E. Griswold, J. Poage, and 
I. Polonsky. 

The SNOBOL4 Programming Language, 
Second edition. 

Prentice-Hall, Englewood Cliffs, NJ, 1971. 

SNOBOL set the standard for string processing languages, 
and this excellent book by the implementors of 
SNOBOL4 is an unusually lucid explanation of how to use 
it. 


158 


Ilg 


) 


on 


.ie 

:rs 

aJ. 

>e- 


:s, 

of 

se 


[Hart 67] I Iorace Hart (editor). 

Hart 's Rules for Compositors and Readers at 
the University Press, Oxford (Thirty- 
seventh Edition). 

The University Press, Oxford, London, 

1967. 

The British last word in style manuals. 


[Hershey 72] Allen V. Hershey. 

A computer system for scientific 
typography. 

Computer Graphics and Image Processing 
1:373-385,1972. 

An early and influential document preparation system. 
This project was the first big attempt at handling the 
character set issues as well as formatting and display 
issues. 


[IBM 76] IBM SCRIPT/370 Version 3 User’s Guide, 
publication SH20-1S57-0 
IBM Data Processing Division, White 
Plains, NY, 1976. 

The user’s manual for one of the most widely used text 
formatters. 


[Kernighan 75] Brian W. Kemighan and Ixtrinda 
L. Cherry. 

A System for Typesetting Mathematics. 
Communications of the ACM 18(3): 182-193, 
March, 1975. 

The EQN system, described in this paper, revolutionized 
mathematical typesetting. 

[Kemighan 76] Brian W. Kernighan and P. J. Plauger. 
Software Tools. 

Addison-Wesley, 1976. 

Lucid and detailed explanations of the workings of various 
software tools, including a fine text editor. 

[Knulh 73] Donald E. Knuth. 

The Art of Computer Programming. Volume 
1: Fundamental Algorithms (Second 
Edition). 

Addison-Wesley, Reading, Mass., 1973. 

It seems to be de rigeur for all computer science 
bibliographies to include this wonderful book. 


[Knuth 77] Donald E. Knuth, James H. Morris Jr„ and 
Vaughan R. Pratt. 

"Fast Pattern Matching in Strings.” 

SIAM .1. Computing 6(2), June 1977. 

The Knuth-Morris-Pratt string searching algorithm 
explained. 

[KnuLh 78] Donald E. Knuth. 

TEX and Metafont: new directions in 
typesetting. 

Digital Press, 1979. 

Three independent papers published in a single volume. 
Part 1 is the text of the Gibbs lecture given by Knuth in 
1978, entitled "Mathematical Typography". Part 2 is the 
user’s manual for the PDP-10 version of Knuth’s TEX 
system. Part 3 is the report and manual for the Metafont 
system for the systematic design and construction of type 
fonts. A classic. 


[Lampson 78] Butler Lampson. 

Bravo Manual 

Xerox Corporation, Palo Alto, CA, 1978. 

This manual is the only published report on the extremely 
influential Bravo text and document editor that was 
developed at the Xerox Palo Alto Research Center in 
1975. It would not normally be worthwhile to include a 
reference to such an obscure and difficult-to-fmd docu- 
ment in a bibliography, but the Bravo work has been so 
influential that we feel it must be included. 


[Lanlz 79] Keith A. Lantz and Rick Rashid. 

"Virtual terminal management in a multiple 
process environment." 

In Proceedings of the Seventh Symposium on 
Operating Systems Principles, 1979. 

This paper describes the virLual terminal management and 
display issues in the RIG system at the University of 
Rochester. 


[Lesk 76b] M. E. Lesk. 

Tbl — A Program to Format Tables. 

Technical Report 49, Bell Laboratories, 

1976. 

The Unix™ table formatter. A very elegant table 
formatting language that is implemented as a preprocessor 
toTROFF. 


159 




[Morison 67] Stanley Morison. 

Cambridge Authors' and Printers' Guides. 
Volume 1: First Principles of 
Typography (Second Edition). 

Cambridge University Press, 1967. 

A definitive work on typography. 

[Newman 79] William M. Newman. 

Page Makeup and Editing. 

In James Foley (editor). Introduction to 
Raster Graphics. Sixth Annual 
Conference on Computer Graphics and 
Interactive Techniques, 
ACM/S1GGRAPH, May, 1979. 

A discussion of the issues and potential solutions to 
problems in page makeup. 

[Ossanna 77] J. F. Ossanna. 

TROFF User's Manual. 

Computing Science Technical Report 54, 
Bell Laboratories, 1977. 

TROFF and its sister program NROFF are probably the 
most widely used text formatters in the world. This is the 
user’s manual. 


[Pierson 72] John Pierson. 

Computer Composition Using PAGE- 1. 
Wiley-Interscience, 1972. 

One of the best writeups on a commercial typesetting 
system. 

[Reid 80a] Brian K. Reid and Janet H. Walker. 

Scribe User's Manual, Third Edition 
Unilogic, Ltd; 605 Devonshire St., 

Pittsburgh PA 15213 USA, 1980. 

The user’s manual for the Scribe document production 
system. 


[Reid 80b] Brian K. Reid. 

Scribe: A Document Specification Language 
and its Compiler. 

Ph.D. Thesis, Carnegie-Mellon University, 
1980 

A discussion of the Scribe language design and the 
workings of the Scribe formatting compiler 


[Roberts 79] Teresa Roberts. 

Evaluation of Computer Text Editors. 
Stanford Ph.D. Thesis, published as Xerox 
Palo Alto Research Center Report SSL- 
79-9. 

An excellent dissertation that compares four text editors 
[NLS, Wang, TECO, and WYI.BUR] from a user 
performance perspective. Editing time, learning time, 
functionality, error rates, and protection from disaster are 
considered. Subjective evaluations and experimental data 
are presented. 


[Sandewall 78] Erik Sandewall. 

"Programming in an Interactive 

Environment: tire LISP experience”. 

In Computing Surveys 10(1), March 1978. 

The most readily available discussion of the Interlisp 
program editor. 


[Swanson 71] Ellen Swanson. 

Mathematics into Type. 

American Mathematical Society, . 
Providence, Rhode Island, 1971. 

The American Mathematical Society’s guidelines for 
typesetting mathematics. This publication predates TEX. 

[Updike 37] Daniel Berkeley Updike. 

Printing Types: their History, Forms, and 
Use (A Study in Survivals). 

Harvard University Press, 1937. 

A classic book on type design and type faces. 

[Van Dam 71]Andreas Van Dam and D. E. Rice. 

"On-Line Text-Editing: A survey." 

ACM Computing Surveys 3(3), 1971. 

A classic paper. Historically important article that reviews 
early text editors. Useful insights for today intermixed 
with dated comments. 


[Wiseman 78] N. E. Wiseman, C. I. O. Campbell, and 
J. Harradine. 

On making graphic arts quality output by 
computer. 

The Computer Journal 21(l):2-6, February, 
1978. 

A short paper on a system whose goal was utmost quality 
of the formatted output. 


160 


f 


UNIVERSITY OF ILLINOIS-URBANA 



3 0112 032939321 


CHCOC/^- 

cm n i> -e* 

I> -O < —f oo 
C/J 

2> 51. O O 

-n-n^co 
c/* -> 

M n "s ■ > 


:o r~o-H o 
>-»r- :*: zr 

M “0 


o ^ 

X 

~n o oo 

a 

hhO 

►— « 

? i (/•, ►_( 

.X! 

r~ 

o 

ZJ 

>— « 

00 

i — 

NO 


ro 

C_ Cm 


ZC >*J 

oco 

4»\> 



Association for Computing Machinery 

1133 Avenue of the Americas, New York, New York 10036 



